Skip to main content

Issues in Practical Large Vocabulary Isolated Word Recognition: The IBM Tangora System

  • Chapter
Automatic Speech and Speaker Recognition

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 355))

Abstract

The IBM TANGORA was the first real-time PC-based large vocabulary isolated word dictation system [14]. Its development and eventual productization in the form of the IBM Personal Dictation System required substantial innovation in all areas of speech recognition, from signal processing to language modeling. This chapter describes some of the algorithmic techniques that had to be developed in order to create a dictation system that could actually be used by real users to create text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L.R. Bahl, S. Das, P.V. de Souza, M. Epstein, R.L. Mercer, B. Merialdo, D. Nahamoo, M.A. Picheny, J. Powell, “Automatic Phonetic Baseform Determination,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, Toronto, Canada, pp. 173–176, May 1991.

    Google Scholar 

  2. L. Bahl, P. de Souza, P. S. Gopalakrishnan, D. Nahamoo, M. Picheny, “Robust methods for using context dependent features and models in a continuous speech recognizer,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, Adelaide, Australia, pp. I-533 – I-536, April 1994.

    Google Scholar 

  3. P.F. Brown, “The acoustic modeling problem in automatic speech recognition,” Ph. D. Thesis, Carnegie Mellon University, Pittsburgh, PA, 1987.

    Google Scholar 

  4. J. Cohen, “Application of an Adaptive Auditory Model to Speech Recognition,” J. Acoust. Soc. America, Supplement 1, Vol. 78, p. S50(A), 1985.

    Article  Google Scholar 

  5. Subrata K. Das, “Some Dimensionality Reduction Studies in Continuous Speech Recognition”, Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, Vol. 1, pp. 292–295, April 1983.

    Google Scholar 

  6. S. Das, R. Bakis, A. Nadas, D. Nahamoo and M. Picheny, “Influence of Background Noise and Microphone on the Performance of the IBM Tangora Speech Recognition System,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. II-71 – II-74, April 1993.

    Google Scholar 

  7. S. Das, A. Nadas, D. Nahamoo and M. Picheny, “Adaptation Techniques for Ambience and Microphone Compensation in the IBM Tangora Speech Recognition System,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. I-21 – I-24, April 1994.

    Google Scholar 

  8. H.P. Friedman, J. Rudin, “On some invariant criteria for grouping data,” Journal of American Statistical Association, Dec 1967, pp.1159–1178.

    Google Scholar 

  9. A. Nadas, R.L. Mercer, L.R. Bahl, R. Bakis, P.S. Cohen, A.G. Cole, F. Jelinek and B.L. Lewis, “Continuous Speech Recognition with Automatically Selected Acoustic Prototypes Obtained by Either Bootstrapping or Clustering,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 1153–1155, March-April 1981.

    Google Scholar 

  10. A. Nadas, D. Nahamoo and M.A. Picheny, “Adaptive Labeling: Normalization of Speech by Adaptive Transformations based on Vector Quantization” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 521–524, April 1988.

    Google Scholar 

  11. M.R. Schroeder and J.L. Hall, “A Model for Mechanical to Neural Transduction in the Auditory Receptor,” J. Acoust. Soc. America, Vol. 55, pp. 1055–1060, 1974.

    Article  Google Scholar 

  12. F. Jelinek, B. Merialdo, S. Roukos and M. Strauss, “A Dynamic Language Model for Speech Recognition,” Proc. DARPA Speech and Natural Language Workshop, (Pacific Grove, CA), pp. 293–295, February 1991.

    Google Scholar 

  13. L.R. Bahl, P.V. de Souza, D. Nahamoo, M.A. Picheny, S. Roukos “Adaptation of Large Vocabulary Recognition System Parameters,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. I-477 – I-480, March 1992.

    Google Scholar 

  14. A. Averbuch, L. Bahl, R. Bakis, P. Brown, A. Cole, G. Daggett, S. Das, K. Davies, S. DeGennaro, P. de Souza, E. Epstein, D. Fraleigh, F. Jelinek, S. Katz, B. Lewis, R. Mercer, A. Nadas, D. Nahamoo, M. Picheny, G. Shichman, P. Spinelli, “An IBM PC Based Large-Vocabulary Isolated-Utterance Speech Recognizer,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 53–56, April 1986.

    Google Scholar 

  15. A. Averbuch, L. Bahl, R. Bakis, P. Brown, G. Daggett, S. Das, K. Davies, S. De Gennaro, P. de Souza, E. Epstein, D. Fraleigh, F. Jelinek, B. Lewis, R. Mercer, J. Moorehead, A. Nadas, D. Nahamoo, M. Picheny, G. Shich-man, P. Spinelli, D. Van Compernolle and H. Wilkens, “Experiments with the Tangora 20,000 Word Speech Recognizer,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 701–704, April 1987.

    Google Scholar 

  16. L.R. Bahl, J.K. Baker, P.S. Cohen, A.G. Cole, F. Jelinek, B.L. Lewis and R.L. Mercer, “Automatic Recognition of Continuously Spoken Sentences from a Finite State Grammar,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 418–421, April 1978.

    Google Scholar 

  17. L.R. Bahl, J.K. Baker, P.S. Cohen, F. Jelinek, B.L. Lewis and R.L. Mercer, “Recognition of a Continuously Read Natural Corpus,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 422–424, April 1978.

    Google Scholar 

  18. L.R. Bahl, R. Bakis, P.S. Cohen, A.G. Cole, F. Jelinek, B.L. Lewis, R.L. Mercer, “Further Results on the Recognition of a Continuously Read Natural Corpus,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 872–875, April 1980.

    Google Scholar 

  19. L.R. Bahl, R. Bakis, P.S. Cohen, A. Cole, F. Jelinek, B.L. Lewis, R.L. Mercer, “Continuous Parameter Acoustic Processing for Speech Recognition of a Natural Speech Corpus,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 1149–1155, March-April 1981.

    Google Scholar 

  20. L.R. Bahl, R. Bakis, P.S. Cohen, A. Cole, F. Jelinek, B.L. Lewis, R.L. Mercer, “Speech Recognition of a Natural Text Read as Isolated Words,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 1168–1171, March-April 1981.

    Google Scholar 

  21. L.R. Bahl, P.F. Brown, P.V. de Souza, R.L. Mercer, M.A. Picheny, “Acoustic Markov Models used in the Tangora Speech Recognition System,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 497–500, April 1988.

    Google Scholar 

  22. L.R. Bahl, F. Jelinek and R.L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans, on Pattern Analysis and Machine Intelligence, pp. 179–190, March 1983.

    Google Scholar 

  23. D. Van Compernolle, “Increased Noise Immunity in Large Vocabulary Speech Recognition with the Aid of Spectral Subtraction,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 1143–1146, April 1987.

    Google Scholar 

  24. R.M. Gray, “Vector Quantization,” IEEE ASSP Magazine, l(2):4–29, April 1984.

    Article  Google Scholar 

  25. J. Makhoul, S. Roucos and H. Gish, “Vector Quantization in Speech Coding,” Proc. IEEE, 73(ll):1551–1588, Nov. 1985.

    Article  Google Scholar 

  26. G. Rigoll, “Baseform Adaptation for Large Vocabulary Hidden Markov Model Based Speech Recognition Systems,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 141–144, April 1990.

    Google Scholar 

  27. R. Schwartz, C. Barry, Y.-L. Chow, A. Derr, M.-W. Feng, O. Kimball, F. Kubala, J. Makhoul and J. Vandergrift, “The BBN BYBLOS Continuous Speech Recognition System,” Proc. Speech and Natural Language Workshop, Feb. 1989.

    Google Scholar 

  28. S. Katz, “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,” IEEE Trans, on Acoustics, Speech and Signal Proc, pp. 400–401, March 1987.

    Google Scholar 

  29. R. Kuhn, “Speech Recognition and the Frequency of Recently Used Words: a Modified Markov Model for Natural Language,” Proc. COLING Budapest, Vol. 1, pp. 348–350, July 1988.

    Google Scholar 

  30. J. Kupiec, “Probabilistic Models of Short and Long Distance Word Dependencies in Running Text,” Proc. Speech and Natural Language DARPA Workshop, pp. 290–295, Feb. 1989.

    Google Scholar 

  31. A. Acero, “Acoustical and Environmental Robustness in Automatic Speech Recognition,” Ph.D. Thesis, Carnegie Mellon University, September 1990.

    Google Scholar 

  32. R.M. Stern, F-H Liu, Y. Ohshima, T.M. Sullivan and A. Acero, “Multiple Approaches to Robust Speech Recognition.” 5th DARPA Workshop on Speech and Natural Language, Arden Conference Center, Harriman, NY, Feb. 1992.

    Google Scholar 

  33. H. Murveit, J. Butzberger, M. Weintraub, “Reduced Channel Dependence for Speech Recognition.” 5th DARPA Workshop on Speech and Natural Language, Arden Conference Center, Harriman, NY, Feb. 1992.

    Google Scholar 

  34. S. Boll, J. Porter, L. Bahler, “Robust Syntax Free Speech Recognition,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 179–182, 1988.

    Google Scholar 

  35. S. Furui, “Speaker-Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum,” IEEE Trans, on Acoustics, Speech and Signal Proc, vol. ASSP–34, pp. 52–59, 1986.

    Article  Google Scholar 

  36. F. Jelinek, “Continuous Speech Recognition by Statistical Methods,” Proc. IEEE, vol. 64, No. 4, pp. 532–556, April 1976.

    Article  Google Scholar 

  37. F. Jelinek, “The Development of an Experimental Discrete Dictation Recognizer,” Proc. IEEE, vol. 73, No. 11, pp. 1616–1624, Nov. 1985.

    Article  Google Scholar 

  38. D.H. Klatt, “Review of Text-to-Speech Conversion for English,” Jr. of the Acoustical Society of America, vol. 82, No. 3, pp. 737–793, Sept. 1987.

    Article  Google Scholar 

  39. J.M. Lucassen and R.L. Mercer, “An Information-Theoretic Approach to the Automatic Determination of Phonetic Baseforms,” Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, pp. 42.5.1–42.5.4, 1984.

    Google Scholar 

  40. E. Zwicker, “Subdivision of the Audible Frequency Range into Critical Bands (Frequenzgruppen),” Jr. of the Acoustical Society of America, vol. 33, No. 2, p 248, February 1961.

    Article  Google Scholar 

  41. E. Lombard, “Le Signe de l’Elevation de la Voix,” Ann. Maladies Oreille, Larynx, Nez, Pharynx, 37:101–119, 1911.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Das, S.K., Picheny, M.A. (1996). Issues in Practical Large Vocabulary Isolated Word Recognition: The IBM Tangora System. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1367-0_19

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8590-8

  • Online ISBN: 978-1-4613-1367-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics