Advertisement

From HMMS to Segment Models: Stochastic Modeling for CSR

  • Mari Ostendorf
Part of the The Kluwer International Series in Engineering and Computer Science book series (SECS, volume 355)

Abstract

In recent years, several alternative models have been proposed to address some of the shortcomings of the hidden Markov model (HMM), currently the most popular approach to speech recognition. Many of these models, which attempt to represent trends or correlation of observations over time, can broadly be classified as segment models. This chapter describes a general probabilistic framework for segment models, including HMMs as a special case, giving options for modeling assumptions in terms of correlation structure and parameter tying and outlining the extensions to HMM recognition and training algorithms needed to handle segment models.

Keywords

Hide Markov Model Speech Recognition State Sequence Distribution Assumption Hide State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    L. R. Bahl, F. Jelinek, and R. L. Mercer, “A maximum likelihood approach to continuous speech recognition,” IEEE Trans. on Pattern Analysis and Machine Intell, vol. PAMI-5, no. 2, pp. 179–190, 1983.CrossRefGoogle Scholar
  2. [2]
    L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.CrossRefGoogle Scholar
  3. [3]
    M. Russell and R. Moore, “Explicit modeling of state occupancy in hidden Markov models for automatic speech recognition,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, 1985, pp. 2376–2379.Google Scholar
  4. [4]
    S. Levinson, “Continuously variable duration hidden Markov models for automatic speech recognition,” Computer Speech and Language, vol. 1, pp. 29–45, 1986.CrossRefGoogle Scholar
  5. [5]
    C. J. Wellekens, “Explicit time correlation in hidden Markov models for speech recognition,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, 1987, pp. 384–386.Google Scholar
  6. [6]
    P. F. Brown, “The acoustic modeling problem in automatic speech recognition,” Ph.D. Thesis, Computer Science Department, CMU, May 1987.Google Scholar
  7. [7]
    P. Kenny, M. Lennig and P. Mermelstein, “A linear predictive HMM for vector-valued observations with applications to speech recognition,” IEEE Trans, on Acoust., Speech, and Signal Processing, vol. ASSP-38, no. 2, pp. 220–225, 1990.CrossRefGoogle Scholar
  8. [8]
    M. Russell, “A segmental HMM for speech pattern matching,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, vol. II, 1993, pp. 499–502.Google Scholar
  9. [9]
    M. Gales and S. Young, “The theory of segmental hidden Markov models,” Cambridge University Engineering Department, Technical Report, CUED/F-INFENG/TR.133, 1993.Google Scholar
  10. [10]
    M. A. Bush and G. E. Kopec, “Network-based connected digit recognition,” IEEE Trans, on Acoust., Speech, and Signal Processing, vol. ASSP-35, no. 10, pp. 1401–1413, 1987.CrossRefGoogle Scholar
  11. [11]
    V. Zue, J. Glass, M. Philips and S. Seneff, “Acoustic segmentation and phonetic classification in the SUMMIT system,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, 1989, pp. 389–392.Google Scholar
  12. [12]
    H. Meng and V. Zue, “Signal representation comparison for phonetic classification,” in Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, May 1991, pp. 285–288.Google Scholar
  13. [13]
    M. Ostendorf and S. Roukos, “A stochastic segment model for phoneme-based continuous speech recognition,” IEEE Trans, on Acoust., Speech, and Signal Processing, vol. 37, no. 12, pp. 1857–1869, 1989.CrossRefGoogle Scholar
  14. [14]
    S. Roucos, R. Schwartz and J. Makhoul, “Segment quantization for very low rate speech coding,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, pp. 1565–1568, 1982.Google Scholar
  15. [15]
    Y. Shiraki and M. Honda, “LPC speech coding based on variable-length segment quantization,” IEEE Trans, on Acoust., Speech, and Signal Processing, vol. 36, no.9, pp. 1437–1444, 1988.MATHCrossRefGoogle Scholar
  16. [16]
    V. Digalakis, “Segment-based stochastic models of spectral dynamics for continuous speech recognition,” Ph.D. Thesis, E.C.S. Department, Boston University, January 1992.Google Scholar
  17. [17]
    A. Kannan and M. Ostendorf, “A comparison of trajectory and mixture modeling in segment-based word recognition,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, vol. II, April 1993, pp. 327–330.Google Scholar
  18. [18]
    L. Deng, M. Aksmanovic, D. Sun and J. Wu, “Speech recognition using hidden Markov models with polynomial regression functions as nonsta-tionary states,” IEEE Trans. on Speech and Audio Proc., vol. 2, no. 4, pp. 507–520, 1994.CrossRefGoogle Scholar
  19. [19]
    O. Ghitza and M. M. Sondhi, “Hidden Markov models with templates as non-stationary states: an application to speech recognition,” Computer Speech and Language, vol. 2, pp. 101–119, 1993.CrossRefGoogle Scholar
  20. [20]
    K. Ross and M. Ostendorf, “A dynamical system model for generating F0 for synthesis,” Proc. of the ESC A/IEEE Workshop on Speech Synthesis, 1994, pp. 131–134.Google Scholar
  21. [21]
    H. Gish, K. Ng and J. R. Rohlicek, “Secondary processing using speech segments for an HMM word spotting system,” Proc. Int’l. Conf. on Spoken Language Proc, vol. I, 1992, pp. 17–20.Google Scholar
  22. [22]
    S. Roucos, M. Ostendorf, H. Gish, and A. Derr, “Stochastic segment modeling using the estimate-maximize algorithm,” Proc. Int’l. Conf. on Acoust, Speech and Signal Processing, 1988, pp. 127–130.Google Scholar
  23. [23]
    M. Ostendorf, A. Kannan, O. Kimball and J. R. Rohlicek, “Continuous word recognition based on the stochastic segment model,” Proc. D ARPA Workshop on CSR, 1992.Google Scholar
  24. [24]
    H. Gish and K. Ng, “A segmental speech model with applications to word spotting,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, 1993, pp. 11–447–450.Google Scholar
  25. [25]
    J. He and H. Leich, “A unified way in incorporating segmental feature and segmental model into HMM,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, 1995, pp. 532–535.Google Scholar
  26. [26]
    C.-H. Lee, F. K. Soong, and B.-H. Juang, “A segment model based approach to speech recognition,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, 1988, pp. 501–504.Google Scholar
  27. [27]
    J. van Santen, “Segmental Duration and Speech Timing,” Computing Prosody, Springer-Verlag, in press.Google Scholar
  28. [28]
    V. Digalakis, M. Ostendorf and J. R. Rohlicek, “Improvements in the stochastic segment model for phoneme recognition,” Proc. DARPA Work-shop on Speech and Natural Language, 1989, pp. 332–338.Google Scholar
  29. [29]
    W. Goldenthal and J. Glass, “Modeling spectral dynamics for vowel classification,” Proc. European Conf. on Speech Commun, and Technology, 1993, pp. 289–292.Google Scholar
  30. [30]
    A. Kannan, M. Ostendorf and J. R. Rohlicek, “Maximum likelihood clustering of Gaussians for speech recognition,” IEEE Trans, on Speech and Audio Proc, vol. 2, no. 3, pp. 453–455, 1994.CrossRefGoogle Scholar
  31. [31]
    K. K. Paliwal, “Use of temporal correlation between successive frames in a hidden Markov model based speech recognizer,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, vol. II, 1993, pp. 215–218.CrossRefGoogle Scholar
  32. [32]
    A. B. Poritz, “Linear predictive hidden Markov models and the speech signal,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, 1982, pp. 1291–1294.Google Scholar
  33. [33]
    L. Deng, “A stochastic model of speech incorporating hierarchical non-stationarity,” IEEE Trans, on Speech and Audio Proc., vol. 1, no. 4, pp. 471–474, 1993.CrossRefGoogle Scholar
  34. [34]
    S. Takahashi, T. Matsuoka, Y. Minami and K. Shikano, “Phoneme HMMs constrained by frame correlations,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, vol. II, 1993, pp. 219–222.CrossRefGoogle Scholar
  35. [35]
    V. Digalakis, J. R. Rohlicek and M. Ostendorf, “A dynamical system approach to continuous speech recognition,” IEEE Trans, on Speech and Audio Proc., vol. 1, no. 4, pp. 431–442, 1993.CrossRefGoogle Scholar
  36. [36]
    R. Bakis, “An articulatory-like speech production model with controlled use of prior knowledge,” notes from Frontiers in Speech Processing: Robust Speech Recognition, CD-ROM, 1993.Google Scholar
  37. [37]
    S. J. Young, J. J. Odell and P. C. Woodland, “Tree-based state tying for high accuracy acoustic modeling,” Proc. ARPA Workshop on Human Language Technology, 1994, pp. 307–312.Google Scholar
  38. [38]
    O. Kimball, “Segment modeling alternatives for continuous speech recognition,” Ph.D. Thesis, E.C.S. Department, Boston University, September 1994.Google Scholar
  39. [39]
    W. Holmes and M. Russell, “Experimental evaluation of segmental HMMs,” Proc. Int’l Conf. on Acoust., Speech and Signal Processing, 1995, pp. 536–539.Google Scholar
  40. [40]
    S. Krishnan and P. V. S. Rao, “Segmental phoneme recognition using piecewise linear regression,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, vol. I, 1994, pp. 49–52.Google Scholar
  41. [41]
    W. Goldenthal and J. Glass, “Statistical trajectory models for phonetic recognition,” Proc. Int’l. Conf. on Spoken Language Proc, 1994, pp. 1871–1874.Google Scholar
  42. [42]
    Y. Gong and J.-P. Haton, “Stochastic trajectory modeling for speech recognition,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, vol. I, 1994, pp. 57–60.Google Scholar
  43. [43]
    M. Ostendorf, V. Digalakis and O. Kimball, “From HMMs to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition,” IEEE Trans, on Speech and Audio Proc, forthcoming.Google Scholar
  44. [44]
    H. C. Leung, I. L. Hetherington and V. Zue, “Speech recognition using stochastic segmental neural networks,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, vol. I, 1992, pp. 613–616.Google Scholar
  45. [45]
    G. Zavaliagkos, Y. Zhao, R. Schwartz and J. Makhoul, “A hybrid segmental neural net/hidden Markov model system for continuous speech recognition,” IEEE Trans, on Speech and Audio Proc, vol. 2, no. 1, Part II, pp. 151–160, 1994.CrossRefGoogle Scholar
  46. [46]
    V. Digalakis, M. Ostendorf and J. R. Rohlicek, “Fast search algorithms for phone classification and recognition using segment-based models,” IEEE Trans, on Signal Proc, vol. 40, no. 12, pp. 2885–2896, 1992.MATHCrossRefGoogle Scholar
  47. [47]
    A. P. Dempster, N. M. Laird and D. B. Rubin, “Maximum likelihood estimation from incomplete data,” Journal of the Royal Statistical Society (B), vol. 39, no. 1, pp. 1–38, 1977.MathSciNetMATHGoogle Scholar
  48. [48]
    L. R. Rabiner, J. G. Wilpon and B.-H. Juang, “A segmental k-means training procedure for connected word recognition,” AT&T Technical Journal, vol. 65, no. 3, pp. 21–40, 1986.Google Scholar
  49. [49]
    L. R. Bahl, P. V. de Souza, P. S. Gopalakrishnan, D. Nahamoo and M. A. Picheny, “Decision trees for phonological rules in continuous speech,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, 1991, pp. 185–188.Google Scholar
  50. [50]
    M.-Y. Hwang and X. Huang, “Shared-distribution hidden Markov models for speech recognition,” IEEE Trans, on Speech and Audio Proc, pp. 414–420, 1993Google Scholar
  51. [51]
    J. Takami and S. Sagayama, “A successive state splitting algorithm for efficient allophone modeling,” Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, vol. I, 1992, pp. 573–576.Google Scholar
  52. [52]
    M. Afify, Y. Gong, J.-P. Haton, “Non-linear time alignment in stochastic trajectory models for speech recognition,” Proc. Int’l. Conf. on Spoken Language Proc, 1994, pp. 291–293.Google Scholar
  53. [53]
    M. Saerens and H. Bourlard, “Linear and nonlinear prediction for speech recognition with hidden Markov models,” Proc. European Conf. on Speech Commun, and Technology, 1993, pp. 807–810.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Mari Ostendorf
    • 1
  1. 1.Dept. of Electrical, Computer and Systems EngineeringBoston UniversityBostonUSA

Personalised recommendations