Skip to main content

Hidden Markov Network for Precise and Robust Acoustic Modeling

  • Chapter
  • 432 Accesses

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 355))

Abstract

This chapter discusses the structure of acoustic models and training algorithms for speech recognition. As is generally recognized, high acoustic model complexity demands more training data. One effective solution is tying at multiple levels such as allophone, state, distribution, or parameter levels. Tied structures such as generalized triphones, state tying, and tied mixtures, have been one of the main streams of research in acoustic modeling of speech. They offer not only precise and robust modeling, but also significant computational advantage. This chapter introduces the Hidden Markov Network (HMnet) which is derived by the Successive State Splitting algorithm. The ultimate goal is an acoustic model with a fully tied acoustic structure in four levels. Vector Field Smoothing (VFS) for speaker adaptation is also discussed for more efficient training of acoustic models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Schwartz, Y-L. Chow, O. Kimball, S. Roucos, M. Krasner, and J. Makhoul: “Context-Dependent Modeling for Acoustic-Phonetic of Continuous Speech,” Proc. ICASSP85, pp. 1205–1208, 1985.

    Google Scholar 

  2. S. Sagayama: “Phoneme Environment Clustering,” Proc. ASJ Conf., 1–5–15, pp. 29–30, Oct 1987. (in Japanese)

    Google Scholar 

  3. S. Sagayama: “Phoneme Environment Clustering for Speech Recognition,” Proc. ICASSP89, pp. 397–400, 1989.

    Google Scholar 

  4. J. Bellegarda and D. Nahamoo: “Tied Mixture Continuous Parameter Models for Large Vocabulary Isolated Speech Recognition,” Proc. ICASSP89, pp. 13–16, 1989.

    Google Scholar 

  5. K-F. Lee: “Context-Dependent Phonetic Hidden Markov Models for Speaker Independent Continuous Speech Recognition,” IEEE Trans. ASSP, Vol 38, No 4, pp. 599–609, 1990.

    Article  Google Scholar 

  6. K-F. Lee, S. Hayamizu, H-W. Hon, C. Huang, J. Swartz, and R. Weide: “Allophone Clustering for Continuous Speech Recognition,” Proc. ICASSP90, pp. 749–752, 1990.

    Google Scholar 

  7. X-D. Huang, K-F. Lee, H-W. Hon, and M-Y. Hwang: “Improved Acoustic Modeling with the SPHINX Speech Recognition System,” Proc. ICASSP91, pp.345–348, 1991.

    Google Scholar 

  8. C-H. Lee, C-H. Lin and B-H. Juang: “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models,” IEEE Trans. SP, Vol. 39, pp. 806–814, Apr 1991.

    Article  Google Scholar 

  9. S. Euler and J. Zinke: “Extending the Vocabulary of a Speaker Independent Recognition System,” Proc. ICASSP91, pp. 301–304, 1991.

    Google Scholar 

  10. D. B. Paul: “The Lincoln Tied-Mixture HMM Continuous Speech Recognizer,” Proc. ICASSP91, pp. 329–332, 1991.

    Google Scholar 

  11. X-D. Huang: “Phoneme Classification Using Semicontinuous Hidden Markov Models,” IEEE Trans. ASSP, Vol 40, No 5, pp. 1062–1067, 1992.

    Google Scholar 

  12. J. Takami and S. Sagayama: “A Successive State Splitting Algorithm for Efficient Allophone Modeling,” Proc. ICASSP92, pp. I-573–576, 1992.

    Google Scholar 

  13. J. Takami and S. Sagayama: “Automatic Generation of Hidden Markov Networks by a Successive State Splitting Algorithm,” IEICE Trans., Vol. J76-D-II, No. 10, pp. 2155–2164, Oct 1993. (in Japanese)

    Google Scholar 

  14. S. J. Young: “The General Use of Tying in Phoneme-based HMM Speech Recognizers,” Proc. ICASSP92, pp. I-569–572, 1992.

    Google Scholar 

  15. K. Ohkura, M. Sugiyama and S. Sagayama: “Speaker Adaptation Based on Transfer Vector Field Smoothing with Continuous Mixture Density HMMs,” Proc. ICSLP92, pp. 369–372, Oct 1992.

    Google Scholar 

  16. H. Hattori and S. Sagayama: “Vector Field Smoothing Principle for Speaker Adaptation,” Proc. ICSLP92, pp. 381–384, Oct 1992.

    Google Scholar 

  17. T. Kosaka, J. Takami and S. Sagayama: “Rapid Speaker Adaptation Using Speaker-Mixture Allophone Models Applied to Speaker-Independent Speech Recognition,” Proc. ICASSP93, pp. II-570–573.

    Google Scholar 

  18. S. J. Young and P. C. Woodland: “The Use of State Tying in Continuous Speech Recognition,” Proc. Eurospeech93, pp. 2203–2206, 1993.

    Google Scholar 

  19. M-Y. Hwang and X-D. Huang: “Shared-Distribution Hidden Markov Models for Speech Recognition,” IEEE Trans. ASSP, Vol 1, No 4, pp. 414–420, 1993.

    Google Scholar 

  20. S. Takahashi and S. Sagayama: “Four-level Tied Structure for Efficient Representation of Acoustic Modeling,” Proc. ICASSP95, pp. 520–523, 1995.

    Google Scholar 

  21. S. Sagayama and S. Takahashi: “On the Use of Scalar Quantization for Fast HMM Computation,” Proc. ICASSP95, pp. 213–216, May 1995.

    Google Scholar 

  22. J. Takahashi and S. Sagayama: “Vector-Field-Smoothed Bayesian Learning for Incremental Speaker Adaptation,” Proc. ICASSP95, pp. 696–699, May 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Sagayama, S. (1996). Hidden Markov Network for Precise and Robust Acoustic Modeling. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1367-0_7

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8590-8

  • Online ISBN: 978-1-4613-1367-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics