An Acoustic-Phonetic and a Model-Theoretic Analysis of Subspace Distribution Clustering Hidden Markov Models

Mak, Brian

doi:10.1023/B:IJST.0000004808.66516.0b

An Acoustic-Phonetic and a Model-Theoretic Analysis of Subspace Distribution Clustering Hidden Markov Models

Published: January 2004

Volume 7, pages 55–68, (2004)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Brian Mak¹

36 Accesses
1 Citation
Explore all metrics

Abstract

Recently, we proposed a new derivative to conventional continuous density hidden Markov modeling (CDHMM) that we call “subspace distribution clustering hidden Markov modeling” (SDCHMM). SDCHMMs can be created by tying low-dimensional subspace Gaussians in CDHMMs. In tasks we tried, usually only 32–256 subspace Gaussian prototypes were needed in SDCHMM-based system to maintain recognition performance of its original CDHMM-based system—a reduction of Gaussian parameters by one to three orders of magnitude. Consequently, both recognition time and memory were greatly reduced. We also have showed that if the underlying subspace distribution tying structure is known, it may be used to train an SDCHMM-based system with as little as eight minutes of speech from scratch. All the results suggest that there is substantial redundancy in conventional CDHMM and that SDCHMM is a more compact model. In this paper, we analyze the tying structure from two perspectives: from the acoustic-phonetic perspective showing that the tying structure seems to capture prominent relationship among phones; and, from the model-theoretic perspective showing that SDCHMMs, if properly created from CDHMMs, may be preferred over the latter as they are less complex and have the potential of greater generalization power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single microphone speech separation by diffusion-based HMM estimation

Article Open access 18 October 2016

Hybrid Subspace Mixture Models for Prediction and Anomaly Detection in High Dimensions

Local Intrinsic Dimensionality III: Density and Similarity

References

Aiyer, A., Gales, M., and Picheny, M. (2000). Rapid likelihood calculation of subspace clustered Gaussian components. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1519-1522.
Akaike, H. (1974). A new look at statistical model identification. IEEE Transactions on Automatic Control 19(6):716–723.
Google Scholar
Astrov, S. (2002). Memory space reduction for hidden Markov models in low-resource speech recognition systems. Proceedings of the International Conference on Spoken Language Processing, pp. 1585-1588.
Baum, L., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics 41:164–171.
Google Scholar
Bellegarda, J. and Nahamoo, D. (1990), Tied mixture continuous parameter modeling for speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 38(12):2033–2045.
Google Scholar
Beyerlein, P. and Ullrich, M. (1995). Hamming distance approximation for a fast log-likelihood computation for mixture densities. Proceedings of the European Conference on Speech Communication and Technology, vol. 2, pp. 1083–1086.
Google Scholar
Bocchieri, E. (1993). Vector quantization for the efficient computation of continuous density likelihoods. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2. pp. 692–695.
Google Scholar
Bocchieri, E. and Mak, B. (2001). Subspace distribution clustering hidden Markov model. IEEE Transactions on Speech and Audio Processing 9(3):264–275.
Google Scholar
Chan, Y.C., Siu, M., and Mak, B. (2000). Pruning of state-tying tree using Bayesian information criterion with multiple mixtures. Proceedings of the International Conference on Spoken Language Processing, vol. IV. Beijing, China, pp. 294–297.
Google Scholar
Chen, S.S. and Gopalakrishnan, P.S. (1998). Clustering via the Bayesian information criterion with applications in speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 645-648.
Gopalakrishnan, P. and Bahl, L. (1996). Fast match techniques. C. Lee, F. Soong, and K. Paliwal (Eds.), Automatic Speech and Speaker Recognition (Advanced Topics). Kluwer Academic Publishers, Chap. 17, pp. 413-428.
Hemphill, C., Godfrey, J., and Doddington, G. (1990). The ATIS spoken language systems pilot corpus. Proceedings of the DARPA Speech and Natural LanguageWorkshop. Morgan Kaufmann Publishers.
Huang, X. and Jack, M. (1989). Semi-continuous hidden Markov models for speech signals. Journal of Computer Speech and Language 3(3):239–251.
Google Scholar
Hwang, M. (1993). Shared distribution hidden Markov models for speech recognition. IEEE Transactions on Speech and Audio Processing 1(4):414–420.
Google Scholar
Komori, Y., Yamada, M., Yamamoto, H., and Ohora, Y. (1995). An efficient output probability computation for continuous HMM using rough and detail models. Proceedings of the European Conference on Speech Communication and Technology, vol. 2. pp. 1087–1090.
Google Scholar
Ladefoged, P. (1993). A Course in Phonetics. 3rd edition. Harcourt Brace Jovanovich College Publishers.
Lanterman, A.D. (2001). Schwarz, Wallace, and Rissanen: Intertwining themes in theories of model selection. International Statistical Review 69(2):185–212.
Google Scholar
Lee, K., Hayamizu, S., Hon, H., Huang, C., Swartz, J., and Weide, R. (1990) Allophone clustering for continuous speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2. pp. 749–752.
Google Scholar
Li, M. and Vitanyi, P. (1997). An Introduction to Kolmogorov Complexity and Its Applications. 2nd edition. New York: Springer-Verlag.
Google Scholar
Liang, Z., Jaszczak, R., and Coleman, R. (1992). Parameter estimation of finite mixtures using the EM algorithm and information criteria with application to medical image processing. IEEE Transactions on Nuclear Science 39:1126–1133.
Google Scholar
Mak, B. and Bocchieri, E. (2001). Direct training of subspace distribution clustering hidden Markov model. IEEE Transactions on Speech and Audio Processing 9(4):378–387.
Google Scholar
Padmanabhan, M., Bahl, D.N.L.R., and de Souza, P. (1997). Decision-tree based quantization of the feature space of a speech recognizer. Proceedings of the European Conference on Speech Communication and Technology, pp. 147-150.
Price, P., Fisher, W., Bernstein, J., and Pallett, D. (1988). The DARPA 1000-word resource management database for continuous speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. pp. 651–654.
Google Scholar
Rigazio, L., Tsakam, B., and Junqua, J. (2000). An optimal Bhattacharyya centroid algorithm for Gaussian clustering with applications in automatic speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3. pp. 1599–1602.
Google Scholar
Rissanen, J. (1978). Modeling by shortest data description. Automatica 14:465–471.
Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6(2):461–464.
Google Scholar
Seide, F. (1995). Fast likelihood computation for continuous-mixture densities using a tree-based nearest neighbor search. Proceedings of the European Conference on Speech Communication and Technology, vol. 2. pp. 1079–1082.
Google Scholar
Singer, E. and Lippmann, R. (1992). A speech recognizer using radial basis function neural networks in an HMM framework. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. pp. 629–632.
Google Scholar
Takahashi, S. and Sagayama, S. (1995). Four-level tied-structure for efficient representation of acoustic modeling. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. pp. 520–523.
Google Scholar
Wallace, C. and Boulton, D. (1968). An information measure for classification. The Computer Journal 11(2):195–209.
Google Scholar
Wax, M. and Kailath, T. (1985). Detection of signals by information theoretic criteria. IEEE Transactions on ASSP 33:387–392.
Google Scholar
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., and Woodland, P. (1999). The HTK Book (for HTK Version 2.2). Entropic Ltd.
Young, S. and Woodland, P. (1993). The use of state tying in continuous speech recognition. Proceedings of the European Conference on Speech Communication and Technology, vol. 3. pp. 2203–2206.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
Brian Mak

Authors

Brian Mak
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mak, B. An Acoustic-Phonetic and a Model-Theoretic Analysis of Subspace Distribution Clustering Hidden Markov Models. International Journal of Speech Technology 7, 55–68 (2004). https://doi.org/10.1023/B:IJST.0000004808.66516.0b

Download citation

Issue Date: January 2004
DOI: https://doi.org/10.1023/B:IJST.0000004808.66516.0b

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Acoustic-Phonetic and a Model-Theoretic Analysis of Subspace Distribution Clustering Hidden Markov Models

Abstract

Access this article

Similar content being viewed by others

Single microphone speech separation by diffusion-based HMM estimation

Hybrid Subspace Mixture Models for Prediction and Anomaly Detection in High Dimensions

Local Intrinsic Dimensionality III: Density and Similarity

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

An Acoustic-Phonetic and a Model-Theoretic Analysis of Subspace Distribution Clustering Hidden Markov Models

Abstract

Access this article

Similar content being viewed by others

Single microphone speech separation by diffusion-based HMM estimation

Hybrid Subspace Mixture Models for Prediction and Anomaly Detection in High Dimensions

Local Intrinsic Dimensionality III: Density and Similarity

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation