Skip to main content

Context-Dependent Vector Clustering for Speech Recognition

  • Chapter
Automatic Speech and Speaker Recognition

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 355))

Abstract

The performance of a large vocabulary speech recognition system is critically tied to the quality of the acoustic prototypes that are established in the relevant feature space(s). This is especially true in continuous speech and/or for speaker-independent tasks, where pronunciation variability is the greatest. In this chapter, we will discuss a number of clustering techniques which can be used to derive high quality acoustic prototypes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L.R. Bahl, F. Jelinek, and R.L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. Pattern Anal. Mach. Intel., Vol. PAMI-5, No. 2, pp. 179–190, March 1983.

    Article  Google Scholar 

  2. L.R. Bahl, P.F. Brown, P.V. de Souza, R.L. Mercer, and M.A. Picheny, “Automatic Construction of Acoustic Markov Models for Words,” in Proc. 1987 Int. Symp. on Signal Proc. and Its Applic, Brisbane, Australia, pp. 565–569, May 1987.

    Google Scholar 

  3. L.R. Bahl, P.F. Brown, P.V. de Souza, R.L. Mercer, and M.A. Picheny, “Acoustic Markov Models Used in the Tangora Speech Recognition System,” in Proc. 1988 Int. Conf. Acoust., Speech, Signal Processing, New York, NY, pp. 497–500, April 1988.

    Google Scholar 

  4. L.R. Bahl, R. Bakis, J.R. Bellegarda, P.F. Brown, D. Burshtein, S.K. Das, P.V. de Souza, P.S. Gopalakrishnan, F. Jelinek, D. Kanevsky, R.L. Mercer, A.J. Nadas, D. Nahamoo, and M.A. Picheny, “Large Vocabulary Natural Language Continuous Speech Recognition,” in Proc. 1989 Int. Conf. Acoust., Speech, Signal Processing, Glasgow, Scotland, pp. 465–467, May 1989.

    Chapter  Google Scholar 

  5. L.R. Bahl, J.R. Bellegarda, P.V. de Souza, P.S. Gopalakrishnan, D. Nahamoo, and M.A. Picheny, “A New Class of Fenonic Markov Word Models for Large Vocabulary Continuous Speech Recognition,” in Proc. 1991 Int. Conf. Acoust., Speech, Signal Processing, Toronto, Canada, pp. 177–180, May 1991.

    Google Scholar 

  6. L.R. Bahl, P.V. de Souza, P.S. Gopalakrishnan, D. Nahamoo, M.A. Picheny, “Decision Trees for Phonological Rules in Continuous Speech,” in Proc. 1991 Int. Conf. Acoust., Speech, Signal Processing, Toronto, Canada, pp. 185–188, May 1991.

    Google Scholar 

  7. L.R. Bahl, P.V. de Souza, P.S. Gopalakrishnan, and M.A. Picheny, “Context-Dependent Vector Quantization for Continuous Speech Recognition,” in Proc. 1993 Int. Conf. Acoust., Speech, Signal Processing, Minneapolis, MN, pp. I632-I635, May 1993.

    Chapter  Google Scholar 

  8. L.R. Bahl, J.R. Bellegarda, P.V. de Souza, P.S. Gopalakrishnan, D. Nahamoo, and M.A. Picheny, “Multonic Markov Word Models for Large Vocabulary Continuous Speech Recognition,” IEEE Trans. Speech Audio Processing, Vol. SAP-1, No. 3, pp. 334–344, July 1993.

    Article  Google Scholar 

  9. L. Bahl, P. de Souza, P. S. Gopalakrishnan, D. Nahamoo, M. Picheny, “Robust Methods for Using Context Dependent Features and Models in a Continuous Speech Recognizer,”, in Proc. 1994 Int. Conf. Acoust., Speech, Signal Processing, Adelaide, Australia, April 1994.

    Google Scholar 

  10. J.R. Bellegarda and D. Nahamoo, “Tied Mixture Continuous Parameter Modeling for Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-38, No. 12, pp. 2033–2045, December 1990.

    Article  Google Scholar 

  11. J.R. Bellegarda P.V. de Souza, A.J. Nadas, D. Nahamoo, M.A. Picheny, and L.R. Bahl, “Robust Speaker Adaptation Using a Piecewise Linear Acoustic Mapping,” in Proc. 1992 Int. Conf Acoust., Speech, Signal Processing, San Francisco, CA, pp. I445-I448, March 1992.

    Google Scholar 

  12. J.R. Bellegarda, P.V. de Souza, D. Nahamoo, M.A. Picheny, and L.R. Bahl, “A Supervised Approach to the Construction of Context-Sensitive Acoustic Prototypes,” in Proceedings 1993 IEEE Int. Conf. Acoust, Speech, Signal Processing, Minneapolis, Minnesota, pp. II644-II647, April 1993.

    Google Scholar 

  13. J.R. Bellegarda, P.V. de Souza, A.J. Nadas, D. Nahamoo, M.A. Picheny, and L.R. Bahl, “The Metamorphic Algorithm: A Speaker Mapping Approach to Data Augmentation,” IEEE Trans. Speech Audio Processing, Vol. SAP-2, No. 3, pp. 413–420, July 1994.

    Article  Google Scholar 

  14. P.F. Brown, The Acoustic Modeling Problem in Automatic Speech Recognition, Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, 1987.

    Google Scholar 

  15. P.A. Chou, Applications of Information Theory to Pattern Recognition and the Design of Decision Trees and Trellises, Ph.D. Thesis, Stanford University, Stanford, CA, 1988.

    Google Scholar 

  16. J.R. Cohen, “Application of an Auditory Model to Speech Recognition,” J. Acoust Soc. Am., Vol. 85, No. 6, pp. 2623–2629, June 1989.

    Article  Google Scholar 

  17. T. Dalenius, “The Problem of Optimum Stratification,” Skandinavisk Ak-tuarietidskrift, Vol. 34, pp. 133–148, 1951.

    MathSciNet  Google Scholar 

  18. V. Digalakis and H. Murveit, “Genones: Optimizing the Degree of Tying in a Large Vocabulary HMM-based Speech Recognizer,”, in Proc. 1994 Int. Conf Acoust, Speech, Signal Processing, Adelaide, Australia, April 1994.

    Google Scholar 

  19. G. Fant, Speech Sound and Features, Cambridge, MA: MIT Press, 1973.

    Google Scholar 

  20. J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, and N.L. Dahlgreen, “The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM,” NIST order number PB91–100354.

    Google Scholar 

  21. J.A. Hartigan, Clustering Algorithms, New York, NY: J. Wiley, 1975.

    MATH  Google Scholar 

  22. X.D. Huang, “Phoneme Classification Using Semi-Continuous Hidden Markov Models,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-40, No. 5, pp. 1062–1067, May 1992.

    Google Scholar 

  23. M.-Y. Hwang and X. Huang, “Subphonetic Modeling with Markov State Models,” in Proc. 1992 Int. Conf. Acoust, Speech, Signal Processing, San Francisco, CA, pp. 133–136, March 1992.

    Google Scholar 

  24. M.-Y. Hwang, X. Huang, and F. Alleva, “Predicting Unseen Triphones with Senones,” in Proc. 1993 Int. Conf Acoust, Speech, Signal Processing, Minneapolis, MN, pp. II311-II314, March 1992.

    Google Scholar 

  25. F. Jelinek, “The Development of an Experimental Discrete Dictation Recognizer,” Proc. IEEE, Vol. 73, No. 11, pp. 1616–1624, November 1985.

    Article  Google Scholar 

  26. C.-H. Lee, “Acoustic Modeling of Subword Units for Speech Recognition”, in Proc. 1990 Int. Conf. Acoust., Speech, Signal Processing, Albuquerque, NM, pp. 721–724, April 1990.

    Chapter  Google Scholar 

  27. C.-H. Lee, L.R. Rabiner, R. Pieraccini, and J.G. Wilpon, “Acoustic Modeling for Large Vocabulary Speech Recognition”, Computer Speech and Language, Vol. 4, No. 2, pp. 127–165, April 1990.

    Article  Google Scholar 

  28. K.F. Lee, Automatic Speech Recognition: The Development of the SPHINX System, Boston, MA: Kluwer Academic Publishers, 1989.

    Google Scholar 

  29. K.F. Lee, “Context-Dependent Phonetic Hidden Markov Models for Continuous Speech Recognition”, IEEE Trans. Acoust., Speech, Signal Processing, Vol. 38, No. 4, pp. 599–609, April 1990.

    Article  Google Scholar 

  30. K.F. Lee, S. Hayamizu, H.W. Hon, C. Huang, J. Schwartz, and R. Weide, “Allophone Clustering for Continuous Speech Recognition”, in Proc. 1990 Int. Conf. Acoust., Speech, Signal Processing, Albuquerque, NM, pp. 749–752, April 1990.

    Chapter  Google Scholar 

  31. A. Nadas, R.L. Mercer, L.R. Bahl, R. Bakis, P.S. Cohen, A.G. Cole, F. Jelinek, and B.L. Lewis, “Continuous Speech Recognition with Automatically Selected Prototypes Using Either Bootstrapping or Clustering,” in Proc. 1981 Int. Conf. Acoust., Speech, Signal Processing, Atlanta, GA, pp. 1153–1156, April 1981.

    Google Scholar 

  32. A. Nádas, D. Nahamoo, and M.A. Picheny, “Speech Recognition Using Noise-Adaptive Prototypes,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. 37, No. 10, pp. 1495–1503, October 1989.

    Article  Google Scholar 

  33. D. Nahamoo and L.R. Bahl, “Tree-Based Approaches to Speech and Language Modeling”, Chapter 7 of this book.

    Google Scholar 

  34. M. Nishimura, “HMM-Based Speech Recognition Using Dynamic Spectral Feature,” in Proc. 1989 Int. Conf Acoust, Speech, Signal Processing, Glasgow, UK, pp. 298–301, May 1989.

    Chapter  Google Scholar 

  35. D.S. Pallett, J.G. Fiscus, W.M. Fisher, J.S. Garofolo, B.A. Lund, and M.A. Pryzbocki, “1993 Benchmark Tests for the ARPA Spoken Language Program,” in Proc. ARPA Speech and Natural Language Workshop, Morgan Kaufmann Publishers, pp. 51–73, March 1994.

    Google Scholar 

  36. M. Phillips, J. Glass, V. Zue, “Modelling Context Dependency in Acoustic-Phonetic and Lexical Representations”, Proceedings of the DARPA Speech and Natural Language Workshop, Pacific Grove, CA, pp. 71–76, February 1991.

    Chapter  Google Scholar 

  37. M.A. Picheny and S. Roukos, “Large Vocabulary Isolated Speech Dictation - The IBM Tangora System”, Chapter 14 of this book.

    Google Scholar 

  38. L.R. Rabiner, B.H. Juang, S.E. Levinson, and M.M. Sondhi, “Recognition of Isolated Digits Using Hidden Markov Models with Continuous Mixture Densities”, AT&T Tech. J., Vol. 64, No. 6, pp. 1211–1233, 1985.

    MathSciNet  Google Scholar 

  39. L.R. Rabiner et al., “An Overview of Automatic Speech Recognition”, Chapter 1 of this book.

    Google Scholar 

  40. S. Sagayama and S. Homma, “An Allophone Clustering Technique Applied to Large Vocabulary Word Speech Recognition”, Proc. 1991 IEEE Int. Conf. Acoust., Speech, Signal Processing, Toronto, Canada, May 1991.

    Google Scholar 

  41. R. Schwartz, Y. Chow, O. Kimball, S. Roucos, M. Krasner, and J. Makhoul, “Context-Dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech,” in Proc. 1985 Int. Conf. Acoust., Speech, Signal Processing, Tampa, FL, April 1985.

    Google Scholar 

  42. P.C. Woodland, J.J. Odell, V. Valtchev, and S.J. Young, “Large Vocabulary Continuous Speech Recognition Using HTK,” in Proc. 1994 IEEE Int. Conf. Acoust., Speech, Signal Processing, Adelaide, Australia, April 1994.

    Google Scholar 

  43. S.J. Young and P.C. Woodland, “The Use of State Tying in Continuous Speech Recognition,” in Proc. EUROSPEECim, Berlin, Germany, September 1993.

    Google Scholar 

  44. Y. Zhao, “A Speaker-Independent Continuous Speech Recognition System Using Continuous Mixture Gaussian Density HMM of Phoneme-Sized Units,” IEEE Trans. Speech Audio Processing, Vol. SAP-1, No. 3, pp. 345–361, July 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Bellegarda, J.R. (1996). Context-Dependent Vector Clustering for Speech Recognition. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1367-0_6

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8590-8

  • Online ISBN: 978-1-4613-1367-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics