Skip to main content

Current Methods in Continuous Speech Recognition

  • Chapter
Modern Methods of Speech Processing

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 327))

  • 186 Accesses

Abstract

Several significant advances have been made in continuous speech recognition over the last few years. In this chapter, we will discuss some of the current techniques in feature extraction and modeling for large vocabulary continuous speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. F. Alleva, X. Huang, M. Hwang, “An improved search algorithm for continuous speech recognition,” Proceedings of the 1993 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-93, Minneapolis, MN, May 1993.

    Google Scholar 

  2. X. Aubert, R. Haeb-Umbach, H. Ney, “Continuous mixture densities and linear discriminant analysis for improved context-dependent acoustic models,” Proceedings of the 1993 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-93, Minneapolis, MN, May 1993.

    Google Scholar 

  3. L.R. Bahl, S. Das, P.V. de Souza, M. Epstein, R.L. Mercer, B. Merialdo, D. Nahamoo, M.A. Picheny, J. Powell, “Automatic phonetic baseform determination,” Proceedings of the DARPA Speech and Natural Language Workshop, Hidden Valley, PA, June 1990, pp.179–184.

    Google Scholar 

  4. L.R. Bahl, P.V. de Souza, P.S. Gopalakrishnan, D. Nahamoo, M.A. Picheny, “Decision Trees for Phonological Rules in Continuous Speech,” Proceedings of the 1991 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-91, Toronto, Canada, May 1991.

    Google Scholar 

  5. L.R. Bahl, P.V. de Souza, P. S. Gopalakrishnan, D. Nahamoo, M. Picheny, “Word Lookahead Scheme for Cross-Word Right Context Models in a Stack Decoder,” Proceddings of Eurospeech-93, Berlin, Germany, Sep.1993.

    Google Scholar 

  6. L.R. Bahl, S.V. De Gennaro, P.S. Gopalakrishnan, R.L. Mercer, “A Fast Approximate Acoustic Match for Large Vocabulary Speech Recognition”IEEE Transactions on Speech and AudioJan 1993.

    Google Scholar 

  7. L. Bahl, P. de Souza, P. S. Gopalakrishnan, D. Nahamoo, M. Picheny, “Robust methods for using context dependent features and models in a continuous speech recognizer,” Proceedings of the 1994 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-94, Adelaide, Australia, Apr. 1994.

    Google Scholar 

  8. L.E. Baum, T. Petrie, G. Soules, N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,”Annals of Mathematical Statistics41, 1970, pp.164–171.

    Article  MathSciNet  MATH  Google Scholar 

  9. P.F. Brown, “The acoustic modeling problem in automatic speech recognition,” Ph. D. Thesis, Carnegie Mellon University, Pittsburg, PA, 1987.

    Google Scholar 

  10. J.R. Cohen, “Application of an auditory model to speech recognition,”Journal of the Acoustical Society of AmericaVol.85, No.6, June 1989, pp.2623–2629.

    Article  Google Scholar 

  11. P.S. Cohen and R.L. Mercer, “The Phonological Component of an Automatic Speech Recognition System,” inSpeech RecognitionD.R Reddy, editor, Academic Press, New York, 1975, pp.275–320.

    Google Scholar 

  12. J.R. Deller, J.G. Proakis, J.H.L. HansenDiscrete-time processing of speech signalsMcMillan Publishing, New York, 1993.

    Google Scholar 

  13. E. Diday, J.C. Simon, “Cluster analysis,” inDigital Pattern RecognitionK.S. Fu, ed, Springer-Verlag, NY, 1976.

    Google Scholar 

  14. V. Digalakis, H. Murveit, “Genones: Optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer,” Proceedings of the 1994 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-94, Adelaide, South Australia, April 1994.

    Google Scholar 

  15. G. FantSpeech Sounds and FeaturesMIT Press, Cambridge, MA, 1973.

    Google Scholar 

  16. H.P. Friedman, J. Rudin, “On some invariant criteria for grouping data,”Journal of American Statistical AssociationDec 1967, pp.1159–1178.

    Google Scholar 

  17. J.L. Gauvain, L.F. Lamel, G. Adda, M. Adda-Decker, “The LIMSI continuous speech dictation system: evaluation on the ARPA Wall Street Journal task,” Proceedings of the 1994 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-94, Adelaide, South Australia, April 1994.

    Google Scholar 

  18. P.S. Gopalakrishnan, L. Bahl, R. Mercer, “A Tree Search Strategy for Large-Vocabulary Continuous Speech Recognition,” Proceedings of the 1995 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-95, Detroit, May 1995.

    Google Scholar 

  19. P. Gopalakrishnan, D. Kanevsky, A. Nadas, D. Nahamoo, “An inequality for rational functions with applications to some statistical estimation problems,”IEEE Transactions on Information TheoryVol. 37, No. 1, January, 1991, pp.107–113.

    Article  MATH  Google Scholar 

  20. R. Haeb-Umback, H. Ney, “Linear discriminant analysis for improved large vocabulary continuous speech recognition,” Proceedings of the 1992 International Conference on Acoustics, Speech, and Signal Processing, ICASSP92, San Francisco, CA, March 1992.

    Google Scholar 

  21. H. Hermansky, B.A. Hanson, H.J. Wakita, “Perceptually based linear predictive analysis of speech,” Proceedings of the 1985 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-85, Tampa, Florida, 1985, pp.509–512.

    Google Scholar 

  22. M.-Y. Hwang, X. Huang, “Shared-distribution hidden Markov models for speech recognition,”IEEE Transactions on Speech and Audio ProcessingVol. 1, No. 4, October 1993, pp.414–420.

    Article  Google Scholar 

  23. M.J. Hunt, D.0 Bateman, S.M. Richardson, A. Piau, “An investigation of PLP and IMELDA acoustic representations and of their potential for combination,” Proceedings of the 1991 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-91, Toronto, Canada, May 1991.

    Google Scholar 

  24. F. Jelinek, “Self-organized language modeling for speech recognition,” inReadings in Speech RecognitionA. Waibel, K.-F. Lee, eds., MorganKauffmann, Palo Alto, CA, 1990.

    Google Scholar 

  25. F. Kubala, A. Anastasakos, J. Makhoul, L. Nguyen, R. Schwartz, G. Zavaliagkos, “Comparative experiments on large vocabulary speech recognition,” Proceedings of the 1994 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-94, Adelaide, Australia, April, 1994.

    Google Scholar 

  26. K.F. Lee, H.W. Hon, M.Y. Hwang, S. Mahajan, R. Reddy, “The Sphinx Speech Recognition System,” Proceedings of the 1989 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-89, Glasgow, Scotland, May 1989, pp.445–448.

    Google Scholar 

  27. H. Murveit, J. Butzberger, V. Digalakis, M. Weintraub, “Progressive search algorithms for large vocabulary speech recognition,” Proceedings of the ARPA workshop on Human Language Technology, Plainsboro, NJ, March 1993.

    Google Scholar 

  28. L. Nguyen, R. Schwartz, F. Kubala, P. Placeway, “Search algorithms for software-only real-time recognition with very large vocabularies,” Proceedings of the ARPA workshop on Human Language Technology, Plainsboro, NJ, March 1993.

    Google Scholar 

  29. N. NilssonProblem Solving Methods in Artificial IntelligenceMcGraw-Hill, New York, 1971.

    Google Scholar 

  30. Y. Normandin, R. Cardin, R. De Mori, “High-performance connected digit recognition using maximum mutual information estimation,”IEEE Transactions on Speech and Audio ProcessingVol. 2, No. 2, April 1994, pp.299–311.

    Article  Google Scholar 

  31. B.T. Oshika, V.W. Zue, R.V. Weeks, H. Nue and J. Auerbach, “The Role of Phonological Rules in Speech Understanding Research,”IEEE Transactions on Acoustics Speech and Signal ProcessingVol. ASSP-23, 1975, pp. 104–112.

    Article  Google Scholar 

  32. M. Ostendorf, S. Roukos, “A stochastic segment model for phoneme-based continuous speech recognition,”IEEE Transactions on Acoustics Speech and Signal ProcessingDec. 1989, pp.1857–1869.

    Google Scholar 

  33. D. Paul, “An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model,” Proceedings of the DARPA Workshop on Speech and Natural Language, Harriman, NY, Feb. 1992, pp.405–409.

    Google Scholar 

  34. A.B. Poritz, “Hidden Markov models: A guided tour,” Proceedings of the 1988 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-88, New York, 1988, pp.7–13.

    Google Scholar 

  35. L. Rabiner, B. Juang, “An introduction to hidden Markov models,”IEEE ASSP Magazinevol.3, 1986, pp.4–16.

    Article  Google Scholar 

  36. R. Roth, J. Baker, J. Baker, L. Gillick, M. Hunt, Y. Ito, S. Lowe, J. Orloff, B. Peskin, F. Scattone, “Large vocabulary continuous speech recognition of Wall Street Journal data,” Proceedings of the 1993 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-93, Minneapolis, April 1993.

    Google Scholar 

  37. R. Schwartz, Y. Chow, O. Kimball, S. Roukos, M. Krasner, J. Makhoul, “Context-Dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech,” Proceedings of the 1985 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-85, April 1985.

    Google Scholar 

  38. K. Shikano, “Evaluation of LPC spectral matching measures for phonetic unit recognition,” (technical report), Computer Science Department, Carnegie-Mellon University, May 1985.

    Google Scholar 

  39. A.J. Viterbi, “Error bounds for convolution codes and an asymptotically optimal decoding algorithm,”IEEE Transactions on Information TheoryVol. 13, No. 4, April 1967, pp.260–269.

    Article  MATH  Google Scholar 

  40. P.0 Woodland, J.J. Odell, V. Valtchev, S.J. Young, “Large vocabulary continuous speech recognition using HTK,” Proceedings of the 1994 International Conference on Acoustics, Speech, and Signal Processing, ICASSP94, Adelaide, South Australia, April 1994.

    Google Scholar 

  41. V. Zue, J. Glass, D. Goodine, M. Phillips, S. Seneff, “The Summit speech recognition system: phonological modeling and lexical access,” Proceedings of the 1990 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-90, Albuquerque, NM, April 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer Science+Business Media New York

About this chapter

Cite this chapter

Gopalakrishnan, P.S. (1995). Current Methods in Continuous Speech Recognition. In: Ramachandran, R.P., Mammone, R.J. (eds) Modern Methods of Speech Processing. The Springer International Series in Engineering and Computer Science, vol 327. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2281-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-2281-2_8

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5962-3

  • Online ISBN: 978-1-4615-2281-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics