Skip to main content
Log in

An Acoustic-Phonetic and a Model-Theoretic Analysis of Subspace Distribution Clustering Hidden Markov Models

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Recently, we proposed a new derivative to conventional continuous density hidden Markov modeling (CDHMM) that we call “subspace distribution clustering hidden Markov modeling” (SDCHMM). SDCHMMs can be created by tying low-dimensional subspace Gaussians in CDHMMs. In tasks we tried, usually only 32–256 subspace Gaussian prototypes were needed in SDCHMM-based system to maintain recognition performance of its original CDHMM-based system—a reduction of Gaussian parameters by one to three orders of magnitude. Consequently, both recognition time and memory were greatly reduced. We also have showed that if the underlying subspace distribution tying structure is known, it may be used to train an SDCHMM-based system with as little as eight minutes of speech from scratch. All the results suggest that there is substantial redundancy in conventional CDHMM and that SDCHMM is a more compact model. In this paper, we analyze the tying structure from two perspectives: from the acoustic-phonetic perspective showing that the tying structure seems to capture prominent relationship among phones; and, from the model-theoretic perspective showing that SDCHMMs, if properly created from CDHMMs, may be preferred over the latter as they are less complex and have the potential of greater generalization power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aiyer, A., Gales, M., and Picheny, M. (2000). Rapid likelihood calculation of subspace clustered Gaussian components. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1519-1522.

  • Akaike, H. (1974). A new look at statistical model identification. IEEE Transactions on Automatic Control 19(6):716–723.

    Google Scholar 

  • Astrov, S. (2002). Memory space reduction for hidden Markov models in low-resource speech recognition systems. Proceedings of the International Conference on Spoken Language Processing, pp. 1585-1588.

  • Baum, L., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics 41:164–171.

    Google Scholar 

  • Bellegarda, J. and Nahamoo, D. (1990), Tied mixture continuous parameter modeling for speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 38(12):2033–2045.

    Google Scholar 

  • Beyerlein, P. and Ullrich, M. (1995). Hamming distance approximation for a fast log-likelihood computation for mixture densities. Proceedings of the European Conference on Speech Communication and Technology, vol. 2, pp. 1083–1086.

    Google Scholar 

  • Bocchieri, E. (1993). Vector quantization for the efficient computation of continuous density likelihoods. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2. pp. 692–695.

    Google Scholar 

  • Bocchieri, E. and Mak, B. (2001). Subspace distribution clustering hidden Markov model. IEEE Transactions on Speech and Audio Processing 9(3):264–275.

    Google Scholar 

  • Chan, Y.C., Siu, M., and Mak, B. (2000). Pruning of state-tying tree using Bayesian information criterion with multiple mixtures. Proceedings of the International Conference on Spoken Language Processing, vol. IV. Beijing, China, pp. 294–297.

    Google Scholar 

  • Chen, S.S. and Gopalakrishnan, P.S. (1998). Clustering via the Bayesian information criterion with applications in speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 645-648.

  • Gopalakrishnan, P. and Bahl, L. (1996). Fast match techniques. C. Lee, F. Soong, and K. Paliwal (Eds.), Automatic Speech and Speaker Recognition (Advanced Topics). Kluwer Academic Publishers, Chap. 17, pp. 413-428.

  • Hemphill, C., Godfrey, J., and Doddington, G. (1990). The ATIS spoken language systems pilot corpus. Proceedings of the DARPA Speech and Natural LanguageWorkshop. Morgan Kaufmann Publishers.

  • Huang, X. and Jack, M. (1989). Semi-continuous hidden Markov models for speech signals. Journal of Computer Speech and Language 3(3):239–251.

    Google Scholar 

  • Hwang, M. (1993). Shared distribution hidden Markov models for speech recognition. IEEE Transactions on Speech and Audio Processing 1(4):414–420.

    Google Scholar 

  • Komori, Y., Yamada, M., Yamamoto, H., and Ohora, Y. (1995). An efficient output probability computation for continuous HMM using rough and detail models. Proceedings of the European Conference on Speech Communication and Technology, vol. 2. pp. 1087–1090.

    Google Scholar 

  • Ladefoged, P. (1993). A Course in Phonetics. 3rd edition. Harcourt Brace Jovanovich College Publishers.

  • Lanterman, A.D. (2001). Schwarz, Wallace, and Rissanen: Intertwining themes in theories of model selection. International Statistical Review 69(2):185–212.

    Google Scholar 

  • Lee, K., Hayamizu, S., Hon, H., Huang, C., Swartz, J., and Weide, R. (1990) Allophone clustering for continuous speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2. pp. 749–752.

    Google Scholar 

  • Li, M. and Vitanyi, P. (1997). An Introduction to Kolmogorov Complexity and Its Applications. 2nd edition. New York: Springer-Verlag.

    Google Scholar 

  • Liang, Z., Jaszczak, R., and Coleman, R. (1992). Parameter estimation of finite mixtures using the EM algorithm and information criteria with application to medical image processing. IEEE Transactions on Nuclear Science 39:1126–1133.

    Google Scholar 

  • Mak, B. and Bocchieri, E. (2001). Direct training of subspace distribution clustering hidden Markov model. IEEE Transactions on Speech and Audio Processing 9(4):378–387.

    Google Scholar 

  • Padmanabhan, M., Bahl, D.N.L.R., and de Souza, P. (1997). Decision-tree based quantization of the feature space of a speech recognizer. Proceedings of the European Conference on Speech Communication and Technology, pp. 147-150.

  • Price, P., Fisher, W., Bernstein, J., and Pallett, D. (1988). The DARPA 1000-word resource management database for continuous speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. pp. 651–654.

    Google Scholar 

  • Rigazio, L., Tsakam, B., and Junqua, J. (2000). An optimal Bhattacharyya centroid algorithm for Gaussian clustering with applications in automatic speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3. pp. 1599–1602.

    Google Scholar 

  • Rissanen, J. (1978). Modeling by shortest data description. Automatica 14:465–471.

    Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6(2):461–464.

    Google Scholar 

  • Seide, F. (1995). Fast likelihood computation for continuous-mixture densities using a tree-based nearest neighbor search. Proceedings of the European Conference on Speech Communication and Technology, vol. 2. pp. 1079–1082.

    Google Scholar 

  • Singer, E. and Lippmann, R. (1992). A speech recognizer using radial basis function neural networks in an HMM framework. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. pp. 629–632.

    Google Scholar 

  • Takahashi, S. and Sagayama, S. (1995). Four-level tied-structure for efficient representation of acoustic modeling. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. pp. 520–523.

    Google Scholar 

  • Wallace, C. and Boulton, D. (1968). An information measure for classification. The Computer Journal 11(2):195–209.

    Google Scholar 

  • Wax, M. and Kailath, T. (1985). Detection of signals by information theoretic criteria. IEEE Transactions on ASSP 33:387–392.

    Google Scholar 

  • Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. (1999). The HTK Book (for HTK Version 2.2). Entropic Ltd.

  • Young, S. and Woodland, P. (1993). The use of state tying in continuous speech recognition. Proceedings of the European Conference on Speech Communication and Technology, vol. 3. pp. 2203–2206.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mak, B. An Acoustic-Phonetic and a Model-Theoretic Analysis of Subspace Distribution Clustering Hidden Markov Models. International Journal of Speech Technology 7, 55–68 (2004). https://doi.org/10.1023/B:IJST.0000004808.66516.0b

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:IJST.0000004808.66516.0b

Navigation