Abstract
We apply Long Short-Term Memory (LSTM) recurrent neural networks to a large corpus of unprompted speech- the German part of the VERBMOBIL corpus. By training first on a fraction of the data, then retraining on another fraction, we both reduce time costs and significantly improve recognition rates. For comparison we show recognition rates of Hidden Markov Models (HMMs) on the same corpus, and provide a promising extrapolation for HMM-LSTM hybrids.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the past and the future in protein secondary structure prediction. BIOINF: Bioinformatics 15 (1999)
Chen, J., Chaudhari, N.S.: Capturing long-term dependencies for protein secondary structure prediction. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3174, pp. 494–500. Springer, Heidelberg (2004)
Chen, R., Jamieson, L.: Experiments on the impementation of recurrent neural networks for speech phone recognition. In: Proc. Thirtieth Annual Asilomar Conference on Signals, Systems and Computers, pp. 779–782 (1996)
Elenius, K., Blomberg, M.: Comparing phoneme and feature based speech recognition using artificial neural networks. In: Proc. ICSLP (1992)
Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computation 4, 1–58 (1992)
Gers, F.A., Schmidhuber, J.: Long Short-Term Memory learns simple context free and context sensitive languages. In: Proc. IEEE TNN (2001)
Graves, A., Eck, D., Beringer, N., Schmidhuber, J.: Biologically plausible speech recognition with LSTM neural nets. In: Proc. Bio-ADIT (2004)
Graves, A., Beringer, N., Schmidhuber, J.: Rapid retraining on speech data with lstm recurrent networks. Technical Report IDSIA-05-05, IDSIA (2005), http://www.idsia.ch/techrep.html
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm networks. In: International Joint Conference on Neural Networks, under review, July-August (2005); Currently under review
Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation 9(8), 1735–1780 (1997)
McDonough, J., Waibel, A.: Performance comparisons of all-pass transform adaption with maximum likelihood linear regression. In: Proc. ICSLP (2004)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. 77(2), 257–286 (1989)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 2673–2681 (1997)
Shire, M.: Relating frame accuracy with word error in hybrid ann-hmm asr. In: Proc. EUROSPEECH (2001)
Wahlster, W.: SmartKom: Symmetric multimodality in an adaptive and reusable dialogue shell. In: Krahl, R., Günther, D. (eds.) Proceedings of the Human Computer Interaction Status Conference (2003)
Waterhouse, S., Kershaw, D., Robinson, T.: Smoothed local adaptation of connectionist systems. In: Proc. ICSLP (1996)
Weilhammer, K., Schiel, F., Reichel, U.: Multi-Tier annotations in the Verbmobil corpus. In: Proc. LREC (2002)
Young, S.: The HTK Book. Cambridge University Press, Cambridge (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Beringer, N., Graves, A., Schiel, F., Schmidhuber, J. (2005). Classifying Unprompted Speech by Retraining LSTM Nets. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds) Artificial Neural Networks: Biological Inspirations – ICANN 2005. ICANN 2005. Lecture Notes in Computer Science, vol 3696. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11550822_90
Download citation
DOI: https://doi.org/10.1007/11550822_90
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28752-0
Online ISBN: 978-3-540-28754-4
eBook Packages: Computer ScienceComputer Science (R0)