Part of the Signals and Communication Technology book series (SCT)


Automatic speech recognition (ASR) is an important technology to enable and improve the human–human and human–computer interactions. In this chapter, we introduce the main application areas of ASR systems, describe their basic architecture, and then introduce the organization of the book.


Speech Recognition Gaussian Mixture Model Recurrent Neural Network Automatic Speech Recognition Acoustic Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Proceedings of the Neural Information Processing Systems (NIPS), pp. 153–160 (2006)Google Scholar
  2. 2.
    Clayton, S.: Microsoft research shows a promising new breakthrough in speech translation technology. (2012)
  3. 3.
    Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)CrossRefGoogle Scholar
  4. 4.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRefGoogle Scholar
  5. 5.
    Deng, L., O’Shaughnessy, D.: Speech Processing—A Dynamic and Optimization-Oriented Approach. Marcel Dekker Inc, New York (2003)Google Scholar
  6. 6.
    Deng, L., Yu, D.: Deep Learning: Methods and Applications. NOW Publishers, Delft (2014)Google Scholar
  7. 7.
    Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738 (1990)CrossRefGoogle Scholar
  8. 8.
    Hinton, G.: A practical guide to training restricted Boltzmann machines. Technical Report UTML TR 2010-003, University of Toronto (2010)Google Scholar
  9. 9.
    Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)Google Scholar
  10. 10.
    Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)Google Scholar
  11. 11.
    Huang, X., Acero, A., Hon, H.W., et al.: Spoken Language Processing, vol. 18. Prentice Hall, Englewood Cliffs (2001)Google Scholar
  12. 12.
    Huang, X., Deng, L.: An overview of modern speech recognition. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton (2010). ISBN 978-1420085921Google Scholar
  13. 13.
    Juang, B.H., Hou, W., Lee, C.H.: Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process. 5(3), 257–265 (1997)CrossRefGoogle Scholar
  14. 14.
    LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop. In: Neural Networks: Tricks of the Trade, pp. 9–50. Springer (1998)Google Scholar
  15. 15.
    Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)CrossRefGoogle Scholar
  16. 16.
    Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I–105 (2002)Google Scholar
  17. 17.
    Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  18. 18.
    Rabiner, L., Juang, B.H.: An introduction to hidden markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)CrossRefGoogle Scholar
  19. 19.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Upper Saddle River (1993)Google Scholar
  20. 20.
    Rumelhart, D.E., Hintont, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRefGoogle Scholar
  21. 21.
    Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 24–29 (2011)Google Scholar
  22. 22.
    Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011)Google Scholar
  23. 23.
    Seltzer, M.L., Ju, Y.C., Tashev, I., Wang, Y.Y., Yu, D.: In-car media search. IEEE Signal Process. Mag. 28(4), 50–60 (2011)CrossRefGoogle Scholar
  24. 24.
    Wang, Y.Y., Yu, D., Ju, Y.C., Acero, A.: An introduction to voice search. IEEE Signal Process. Mag. 25(3), 28–38 (2008)CrossRefGoogle Scholar
  25. 25.
    Yu, D., Ju, Y.C., Wang, Y.Y., Zweig, G., Acero, A.: Automated directory assistance system-from theory to practice. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 2709–2712 (2007)Google Scholar
  26. 26.
    Zweig, G., Chang, S.: Personalizing model [M] for voice-search. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 609–612 (2011)Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Microsoft ResearchBothellUSA
  2. 2.Microsoft ResearchRedmondUSA

Personalised recommendations