Advertisement

Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition

  • Alex Graves
  • Santiago Fernández
  • Jürgen Schmidhuber
Conference paper
  • 2.4k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3697)

Abstract

In this paper, we carry out two experiments on the TIMIT speech corpus with bidirectional and unidirectional Long Short Term Memory (LSTM) networks. In the first experiment (framewise phoneme classification) we find that bidirectional LSTM outperforms both unidirectional LSTM and conventional Recurrent Neural Networks (RNNs). In the second (phoneme recognition) we find that a hybrid BLSTM-HMM system improves on an equivalent traditional HMM system, as well as unidirectional LSTM-HMM.

Keywords

Speech Recognition Recurrent Neural Network Phoneme Classification Phoneme Recognition Frame Delay 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the past and the future in protein secondary structure prediction. In: BIOINF: Bioinformatics, vol. 15 (1999)Google Scholar
  2. 2.
    Bourlard, H., Konig, Y., Morgan, N.: REMAP: Recursive estimation and maximization of a posteriori probabilities in connectionist speech recognition. In: Proceedings of Europeech 1995, Madrid (1995)Google Scholar
  3. 3.
    Bourlard, H.A., Morgan, N.: Connnectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Dordrecht (1994)Google Scholar
  4. 4.
    Chen, R., Jamieson, L.: Experiments on the implementation of recurrent neural networks for speech phone recognition. In: Proceedings of the Thirtieth Annual Asilomar Conference on Signals, Systems and Computers, pp. 779–782 (1996)Google Scholar
  5. 5.
    Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: Darpa timit acoustic phonetic continuous speech corpus cdrom (1993)Google Scholar
  6. 6.
    Gers, F., Schraudolph, N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research 3, 115–143 (2002)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks (August 2005) (in press)Google Scholar
  8. 8.
    Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, Los Alamitos (2001)Google Scholar
  9. 9.
    Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  10. 10.
    Robinson, A.J.: An application of recurrent nets to phone probability estimation. IEEE Transactions on Neural Networks 5(2), 298–305 (1994)CrossRefGoogle Scholar
  11. 11.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 2673–2681 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Alex Graves
    • 1
  • Santiago Fernández
    • 1
  • Jürgen Schmidhuber
    • 1
    • 2
  1. 1.IDSIAManno-LuganoSwitzerland
  2. 2.TU MunichGarching, MunichGermany

Personalised recommendations