Skip to main content
Log in

Speaker identification using multi-step clustering algorithm with transformation-based GMM

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

To improve the performance of speaker recognition, the embedded linear transformation is used to integrate both transformation and diagonal-covariance Caussian mixture into a unified framework. In the case, the mixture number of GMM must be fixed in model training. The cluster expectation-maximization (EM) algorithm is a well-known technique in which the mixture number is regarded as an estimated parameter. This paper presents a new model structure that integrates a multi-step cluster algorithm into the estimating process of GMM with the embedded transformation. In the approach, the transformation matrix, the mixture number and model parameters are simultaneously estimated according to a maximum likelihood criterion. The proposed method is demonstrated on a database of three data sessions for text independent speaker identification. The experiments show that this method outperforms the traditional GMM with cluster EM algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Furui, S., An Overview of Speaker Recognition Technology, Automatic Speech and Speaker Recognition, Lee, C., Soong, F., and Paliwal, K., Eds., Kluwer Academic Press, 1996.

  2. Reynolds, D.A. and Rose, R.C., Robust Text-independent Speaker Identification Using Gaussian Mixture Speaker models, IEEE Trans. Speech Audio Process, 1995, vol. 3, no. 1, pp. 72–83.

    Article  Google Scholar 

  3. You, K.-H. and Wang, H.-C., Joint Estimation of Feature Transformation Parameters and Gaussian mixture Model for Speaker identification, Speech Communication, 1999, vol. 28, pp. 227–241.

    Article  Google Scholar 

  4. Gong, J.P., On MMI Learning of Gaussian Mixture for Speaker Models (Proc. EUROSPEECH’95), 1995, pp. 363–366.

  5. Hong, Q.Y. and Kwong, S., Discriminative Training for Speaker Identification Based on Maximum Model Distance Algorithm (Proc. IEEE Int. Conf. on Acoustic, Speech, and Signal Process), 2004, vol. 1, pp. 25–28.

    Google Scholar 

  6. Hong, Q.Y. and Kwong, S., A Discriminative Training Approach for Text-independent Speaker Recognition, Signal Processing, 2005, vol. 85, pp. 1449–1463.

    Article  Google Scholar 

  7. Ljolje, A., The Importance of Cepstral Parameter Correlations in Speech Recognition, Computer Speech and Language, 1994, vol. 8, pp. 223–232.

    Article  Google Scholar 

  8. Chen, C.-C.T., Chen, C.-T., and Hou, C.-K., Speaker Identification Using Hybrid Karhunen-Loeve transform and Gaussian mixture model approach, Pattern Recognition, 2004, vol. 37, pp. 1073–1075.

    Article  Google Scholar 

  9. Fukunaga, K., Introduction to Statistical Pattern Recognition, Academic Press, 1990.

  10. Boulis, C., Diakoloukas, V., and Digalakis, V., Maximum Likelihood Stochastic Transformation Adaptation for Medium and Small Data Sets, Computer Speech and Language, 2001, vol. 15, pp. 257–285.

    Article  Google Scholar 

  11. Bouman, C.A., Cluster: An Unsupervised Algorithm for Modeling Gaussian Mixtures, http://www.ece.purdue.edu/:_bouman. 2005.7.

  12. Rissanen, J., A Universal Prior for Integers and Estimation by Minimum Description Length, Annals of Statistics, 1983, vol. 11, no. 2, pp. 417–431.

    MathSciNet  Google Scholar 

  13. Crunwald, P.D., Model selection based on minimum description length. Journal of Mathematical Psychology, 2000, vol. 44, no. 1, pp. 133–152.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Limin Xu.

Additional information

This text was submitted by the authors in English.

About this article

Cite this article

Xu, L., Tang, Z. Speaker identification using multi-step clustering algorithm with transformation-based GMM. Aut. Conrol Comp. Sci. 41, 224–231 (2007). https://doi.org/10.3103/S0146411607040062

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411607040062

Key words

Navigation