The Use of Recurrent Neural Networks in Continuous Speech Recognition

  • Tony Robinson
  • Mike Hochberg
  • Steve Renals
Part of the The Kluwer International Series in Engineering and Computer Science book series (SECS, volume 355)


This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acoustic data, the recurrent network generates an estimate of the posterior probability of of the possible phones given the observed acoustic signal. The posteriors are then converted into scaled likelihoods and used as the observation probabilities within a conventional decoding paradigm (e.g., Viterbi decoding). The advantages of using recurrent networks are that they require a small number of parameters and provide a fast decoding capability (relative to conventional, large-vocabulary, HMM systems)3.


Speech Recognition Recurrent Neural Network Recurrent Network Speech Recognition System Continuous Speech Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    H. F. Silverman and D. P. Morgan, “The application of dynamic programming to connected speech recognition,” IEEE ASSP Magazine, vol. 7, pp. 6–25, July 1990.CrossRefGoogle Scholar
  2. [2]
    L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, pp. 257–286, February 1989.CrossRefGoogle Scholar
  3. [3]
    N. Morgan and H. Bourlard, “Continuous speech recognition using multilayer perceptrons with hidden Markov models,” in Proc. ICASSP, pp. 413–416, 1990.Google Scholar
  4. [4]
    S. Renals, N. Morgan, H. Bourlard, M. Cohen, and H. Franco, “Connectionist probability estimators in HMM speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 2, Jan. 1994.Google Scholar
  5. [5]
    F. Jelinek and R. Mercer, “Interpolated estimation of Markov source parameters from sparse data,” Pattern Recognition in Practice, pp. 381–397, 1980.Google Scholar
  6. [6]
    K.-F. Lee, Automatic Speech Recognition: The Development of the SPHINX System. Boston: Kluwer Academic Publishers, 1989.Google Scholar
  7. [7]
    S. Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, pp. 52–59, Feb. 1986.CrossRefGoogle Scholar
  8. [8]
    E. B. Baum and F. Wilczek, “Supervised learning of probability distributions by neural networks,” in Neural Information Processing Systems (D. Z. Anderson, ed.), American Institute of Physics, 1988.Google Scholar
  9. [9]
    J. S. Bridle, “Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition,” in Neuro-computing: Algorithms, Architectures and Applications (F. Fougelman-Soulie and J. Heŕault, eds.), pp. 227–236, Springer-Verlag, 1989.Google Scholar
  10. [10]
    H. Bourlard and C. J. Wellekens, “Links between Markov models and multilayer perceptrons,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 1167–1178, Dec. 1990.CrossRefGoogle Scholar
  11. [11]
    H. Gish, “A probabilistic approach to the understanding and training of neural network classifiers,” in Proc. ICASSP, pp. 1361–1364, 1990.Google Scholar
  12. [12]
    M. D. Richard and R. P. Lippmann, “Neural network classifiers estimate Bayesian a posteriori probabilities,” Neural Computation, vol. 3, pp. 461–483, 1991.CrossRefGoogle Scholar
  13. [13]
    H. Bourlard and N. Morgan, Connectionist Speech Recognition — A Hybrid Approach. Kluwer Academic Publishers, 1994.Google Scholar
  14. [14]
    J. S. Bridle, “Alpha-Nets: A recurrent ‘neural’ network architecture with a hidden Markov model interpretation,” Speech Communication, vol. 9, pp. 83–92, Feb. 1990.CrossRefGoogle Scholar
  15. [15]
    J. S. Bridle and L. Dodd, “An Alphanet approach to optimising input transformations for continuous speech recognition,” in Proc. ICASSP, pp. 277–280, 1991.Google Scholar
  16. [16]
    L. T. Niles and H. F. Silverman, “Combining hidden Markov models and neural network classifiers,” in Proc. ICASSP, pp. 417–420, 1990.Google Scholar
  17. [17]
    S. J. Young, “Competitive training in hidden Markov models,” in Proc. ICASSP, pp. 681–684, 1990. Expanded in the technical report Cued/Finfeng/TR.41, Cambridge University Engineering Department.Google Scholar
  18. [18]
    A. J. Robinson and F. Fallside, “Static and dynamic error propagation networks with application to speech coding,” in Neural Information Processing Systems (D. Z. Anderson, ed.), American Institute of Physics, 1988.Google Scholar
  19. [19]
    P. McCullagh and J. A. Neider, Generalised Linear Models. London: Chapman and Hall, 1983.Google Scholar
  20. [20]
    T. Robinson, “The state space and “ideal input” representations of recurrent networks,” in Visual Representations of Speech Signals, pp. 327–334, John Wiley and Sons, 1993.Google Scholar
  21. [21]
    A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm (with discussion),” J. Roy. Statist. Soc., vol. B39, pp. 1–38, 1977.MathSciNetGoogle Scholar
  22. [22]
    D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” in Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. I: Foundations. (D. E. Rumelhart and J. L. McClelland, eds.), ch. 8, Cambridge, MA: Bradford Books/MIT Press, 1986.Google Scholar
  23. [23]
    P. J. Werbos, “Backpropagation through time: What it does and how to do it,” Proc. IEEE, vol. 78, pp. 1550–1560, Oct. 1990.CrossRefGoogle Scholar
  24. [24]
    R. A. Jacobs, “Increased rates of convergence through learning rate adaptation,” Neural Networks, vol. 1, pp. 295–307, 1988.CrossRefGoogle Scholar
  25. [25]
    W. Schiffmann, M. Joost, and R. Werner, “Optimization of the backpropagation algorithm for training multilayer perceptrons,” tech. rep., University of Koblenz, 1992.Google Scholar
  26. [26]
    T. T. Jervis and W. J. Fitzgerald, “Optimization schemes for neural networks,” Tech. Rep. CUED/F-INFENG/TR144, Cambridge University Engineering Department, Aug. 1993.Google Scholar
  27. [27]
    M. M. Hochberg, S. J. Renals, A. J. Robinson, and D. J. Kershaw, “Large vocabulary continuous speech recognition using a hybrid connectionist-HMM system,” in Proc. of ICSLP-94, pp. 1499–1502, 1994.Google Scholar
  28. [28]
    M. M. Hochberg, G. D. Cook, S. J. Renals, and A. J. Robinson, “Connect ionist model combination for large vocabulary speech recognition,” in Neural Networks for Signal Processing IV (J. Vlontzos, J.-N. Hwang, and E. Wilson, eds.), pp. 269–278, IEEE, 1994.Google Scholar
  29. [29]
    T. H. Crystal and A. S. House, “Segmental durations in connected-speech signals: Current results,” J. Acoust. Soc. Am., vol. 83, pp. 1553–1573, Apr. 1988.CrossRefGoogle Scholar
  30. [30]
    L. R. Bahl and F. Jelinek, “Apparatus and method for determining a likely word sequence from labels generated by an acoustic processor.” US Patent 4,748,670, May 1988.Google Scholar
  31. [31]
    D. B. Paul, “An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model,” in Proc. ICASSP, vol. 1, (San Francisco), pp. 25–28, 1992.Google Scholar
  32. [32]
    S. J. Renals and M. M. Hochberg, “Decoder technology for connectionist large vocabulary speech recognition,” Tech. Rep. Cued/Finfeng/TR.186, Cambridge University Engineering Department, 1994.Google Scholar
  33. [33]
    S. Renals and M. Hochberg, “Efficient search using posterior phone probability estimates,” in Proc. ICASSP, pp. 596–599, 1995.Google Scholar
  34. [34]
    P. S. Gopalakrishnan, D. Nahamoo, M. Padmanabhan, and M. A. Picheny, “A channel-bank-based phone detection strategy,” in Proc. ICASSP, vol. 2, (Adelaide), pp. 161–164, 1994.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Tony Robinson
    • 1
  • Mike Hochberg
    • 1
  • Steve Renals
    • 1
  1. 1.Engineering DepartmentCambridge UniversityCambridgeUK

Personalised recommendations