Abstract
To improve the performance of speaker recognition, the embedded linear transformation is used to integrate both transformation and diagonal-covariance Caussian mixture into a unified framework. In the case, the mixture number of GMM must be fixed in model training. The cluster expectation-maximization (EM) algorithm is a well-known technique in which the mixture number is regarded as an estimated parameter. This paper presents a new model structure that integrates a multi-step cluster algorithm into the estimating process of GMM with the embedded transformation. In the approach, the transformation matrix, the mixture number and model parameters are simultaneously estimated according to a maximum likelihood criterion. The proposed method is demonstrated on a database of three data sessions for text independent speaker identification. The experiments show that this method outperforms the traditional GMM with cluster EM algorithm.
Similar content being viewed by others
References
Furui, S., An Overview of Speaker Recognition Technology, Automatic Speech and Speaker Recognition, Lee, C., Soong, F., and Paliwal, K., Eds., Kluwer Academic Press, 1996.
Reynolds, D.A. and Rose, R.C., Robust Text-independent Speaker Identification Using Gaussian Mixture Speaker models, IEEE Trans. Speech Audio Process, 1995, vol. 3, no. 1, pp. 72–83.
You, K.-H. and Wang, H.-C., Joint Estimation of Feature Transformation Parameters and Gaussian mixture Model for Speaker identification, Speech Communication, 1999, vol. 28, pp. 227–241.
Gong, J.P., On MMI Learning of Gaussian Mixture for Speaker Models (Proc. EUROSPEECH’95), 1995, pp. 363–366.
Hong, Q.Y. and Kwong, S., Discriminative Training for Speaker Identification Based on Maximum Model Distance Algorithm (Proc. IEEE Int. Conf. on Acoustic, Speech, and Signal Process), 2004, vol. 1, pp. 25–28.
Hong, Q.Y. and Kwong, S., A Discriminative Training Approach for Text-independent Speaker Recognition, Signal Processing, 2005, vol. 85, pp. 1449–1463.
Ljolje, A., The Importance of Cepstral Parameter Correlations in Speech Recognition, Computer Speech and Language, 1994, vol. 8, pp. 223–232.
Chen, C.-C.T., Chen, C.-T., and Hou, C.-K., Speaker Identification Using Hybrid Karhunen-Loeve transform and Gaussian mixture model approach, Pattern Recognition, 2004, vol. 37, pp. 1073–1075.
Fukunaga, K., Introduction to Statistical Pattern Recognition, Academic Press, 1990.
Boulis, C., Diakoloukas, V., and Digalakis, V., Maximum Likelihood Stochastic Transformation Adaptation for Medium and Small Data Sets, Computer Speech and Language, 2001, vol. 15, pp. 257–285.
Bouman, C.A., Cluster: An Unsupervised Algorithm for Modeling Gaussian Mixtures, http://www.ece.purdue.edu/:_bouman. 2005.7.
Rissanen, J., A Universal Prior for Integers and Estimation by Minimum Description Length, Annals of Statistics, 1983, vol. 11, no. 2, pp. 417–431.
Crunwald, P.D., Model selection based on minimum description length. Journal of Mathematical Psychology, 2000, vol. 44, no. 1, pp. 133–152.
Author information
Authors and Affiliations
Corresponding author
Additional information
This text was submitted by the authors in English.
About this article
Cite this article
Xu, L., Tang, Z. Speaker identification using multi-step clustering algorithm with transformation-based GMM. Aut. Conrol Comp. Sci. 41, 224–231 (2007). https://doi.org/10.3103/S0146411607040062
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3103/S0146411607040062