Advertisement

Classifying Unprompted Speech by Retraining LSTM Nets

  • Nicole Beringer
  • Alex Graves
  • Florian Schiel
  • Jürgen Schmidhuber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3696)

Abstract

We apply Long Short-Term Memory (LSTM) recurrent neural networks to a large corpus of unprompted speech- the German part of the VERBMOBIL corpus. By training first on a fraction of the data, then retraining on another fraction, we both reduce time costs and significantly improve recognition rates. For comparison we show recognition rates of Hidden Markov Models (HMMs) on the same corpus, and provide a promising extrapolation for HMM-LSTM hybrids.

Keywords

Hide Markov Model Recurrent Neural Network Read Speech Protein Secondary Structure Prediction Maximum Likelihood Linear Regression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the past and the future in protein secondary structure prediction. BIOINF: Bioinformatics 15 (1999)Google Scholar
  2. 2.
    Chen, J., Chaudhari, N.S.: Capturing long-term dependencies for protein secondary structure prediction. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3174, pp. 494–500. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Chen, R., Jamieson, L.: Experiments on the impementation of recurrent neural networks for speech phone recognition. In: Proc. Thirtieth Annual Asilomar Conference on Signals, Systems and Computers, pp. 779–782 (1996)Google Scholar
  4. 4.
    Elenius, K., Blomberg, M.: Comparing phoneme and feature based speech recognition using artificial neural networks. In: Proc. ICSLP (1992)Google Scholar
  5. 5.
    Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computation 4, 1–58 (1992)CrossRefGoogle Scholar
  6. 6.
    Gers, F.A., Schmidhuber, J.: Long Short-Term Memory learns simple context free and context sensitive languages. In: Proc. IEEE TNN (2001)Google Scholar
  7. 7.
    Graves, A., Eck, D., Beringer, N., Schmidhuber, J.: Biologically plausible speech recognition with LSTM neural nets. In: Proc. Bio-ADIT (2004)Google Scholar
  8. 8.
    Graves, A., Beringer, N., Schmidhuber, J.: Rapid retraining on speech data with lstm recurrent networks. Technical Report IDSIA-05-05, IDSIA (2005), http://www.idsia.ch/techrep.html
  9. 9.
    Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm networks. In: International Joint Conference on Neural Networks, under review, July-August (2005); Currently under reviewGoogle Scholar
  10. 10.
    Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  11. 11.
    McDonough, J., Waibel, A.: Performance comparisons of all-pass transform adaption with maximum likelihood linear regression. In: Proc. ICSLP (2004)Google Scholar
  12. 12.
    Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. 77(2), 257–286 (1989)Google Scholar
  13. 13.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 2673–2681 (1997)CrossRefGoogle Scholar
  14. 14.
    Shire, M.: Relating frame accuracy with word error in hybrid ann-hmm asr. In: Proc. EUROSPEECH (2001)Google Scholar
  15. 15.
    Wahlster, W.: SmartKom: Symmetric multimodality in an adaptive and reusable dialogue shell. In: Krahl, R., Günther, D. (eds.) Proceedings of the Human Computer Interaction Status Conference (2003)Google Scholar
  16. 16.
    Waterhouse, S., Kershaw, D., Robinson, T.: Smoothed local adaptation of connectionist systems. In: Proc. ICSLP (1996)Google Scholar
  17. 17.
    Weilhammer, K., Schiel, F., Reichel, U.: Multi-Tier annotations in the Verbmobil corpus. In: Proc. LREC (2002)Google Scholar
  18. 18.
    Young, S.: The HTK Book. Cambridge University Press, Cambridge (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Nicole Beringer
    • 1
  • Alex Graves
    • 1
  • Florian Schiel
    • 2
  • Jürgen Schmidhuber
    • 1
    • 3
  1. 1.IDSIAManno-LuganoSwitzerland
  2. 2.Schiel BAS ServicesMunichGermany
  3. 3.TU MunichGarching, MunichGermany

Personalised recommendations