Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model

  • Wan-Chen Chen
  • Ching-Tang Hsieh
  • Eugene Lai
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3248)


This paper presents an effective method for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency bands in order not to spread noise distortions over the entire feature space. The linear predictive cepstral coefficients (LPCCs) of each band are calculated. Furthermore, the cepstral mean normalization technique is applied to all computed features. We use feature recombination and likelihood recombination methods to evaluate the task of the text-independent speaker identification. The feature recombination scheme combines the cepstral coefficients of each band to form a single feature vector used to train the Gaussian mixture model (GMM). The likelihood recombination scheme combines the likelihood scores of independent GMM for each band. Experimental results show that both proposed methods outperform the GMM model using full-band LPCCs and mel-frequency cepstral coefficients (MFCCs) in both clean and noisy environments.


Gaussian Mixture Model Dynamic Time Warping Speaker Recognition Speaker Verification Speaker Identification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Atal, B.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Acoust. Soc. Amer. J. 55, 1304–1312 (1974)CrossRefGoogle Scholar
  2. 2.
    White, G.M., Neely, R.B.: Speech recognition experiments with linear prediction, bandpass filtering, and dynamic Programming. IEEE Trans. Acoustics, Speech, Signal Processing 24(2), 183–188 (1976)CrossRefGoogle Scholar
  3. 3.
    Vergin, R., Shaughnessy, O., Farhat, D., Generalized, A.: mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans. Speech and Audio Processing 7(5), 525–532 (1999)CrossRefGoogle Scholar
  4. 4.
    Lockwood, P., Boudy, J.: Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars. Speech Commun 11(2-3), 21–228 (1992)CrossRefGoogle Scholar
  5. 5.
    Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust., Speech, Signal Processing 29(2), 254–272 (1981)CrossRefGoogle Scholar
  6. 6.
    Soong, F.K., Rosenberg, A.E.: On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans. Acoust., Speech, Signal Processing 36(6), 871–879 (1988)zbMATHCrossRefGoogle Scholar
  7. 7.
    Hermansky, H., Tibrewala, S., Pavel, M.: Toward ASR on partially corrupted speech. In: Proc. Int. Conf. Spoken Language Processing, vol. 1, pp. 462–465 (1996)Google Scholar
  8. 8.
    Mirghafori, N., Morgan, N.: Combining connectionist multi-band and full-band probability streams for speech recognition of natural numbers. In: Proc. Int. Conf. Spoken Language Processing, vol. 3, pp. 743–747 (1998)Google Scholar
  9. 9.
    Bourlard, H., Dupont, S.: A new ASR approach based on independent processing and recombination of partial frequency bands. In: Proc. Int. Conf. Spoken Language Processing, pp. 426–429 (1996)Google Scholar
  10. 10.
    Okawa, S., Bocchieri, E., Potamianos, A.: Multi-band speech recognition in noisy environments. In: Proc. IEEE ICASSP 1998, vol. 2, pp. 641–644 (1998)Google Scholar
  11. 11.
    Hsieh, C.T., Lai, E., Wang, Y.C.: A robust speaker identification system based on wavelet transform. IEICE Trans. Inf. & Syst. E84-D(7), 839–846 (2001)Google Scholar
  12. 12.
    Hsieh, C.T., Lai, E., Wang, Y.C.: Robust speech features based on wavelet transform with application to speaker identification. In: IEE Proceedings. Vision, Image and Signal Processing, vol. 149(2), pp. 108–114 (2002)Google Scholar
  13. 13.
    Furui, S.: Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Acoust., Speech, Signal Processing 29(3), 342–350 (1981)CrossRefGoogle Scholar
  14. 14.
    Poritz, A.: Linear predictive hidden markov models and the speech signal. In: Proc. IEEE ICASSP 1982, vol. 2, pp. 1291–1294 (1982)Google Scholar
  15. 15.
    Tishby, N.Z.: On the application of mixture AR hidden Markov models to text independent speaker recognition. IEEE Trans. Signal Processing 39, 563–570 (1991)CrossRefGoogle Scholar
  16. 16.
    Reynolds, D.A., Rose, R.C.: Robust test-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Processing 3(1), 72–83 (1995)CrossRefGoogle Scholar
  17. 17.
    Miyajima, C., Hattori, Y., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Textindependent speaker identification using Gaussian mixture models based on multi-space probability distribution. IEICE Trans. Inf. & Syst. E84-D(7), 847–855 (2001)Google Scholar
  18. 18.
    Alamo, C.M., Gil, F.J.C., Munilla, C.T., Gomez, L.H.: Discriminative training of GMM for speaker identification. In: Proc. IEEE ICASSP 1996, pp. 89–92 (1996)Google Scholar
  19. 19.
    Pellom, B.L., Hansen, J.H.L.: An effective scoring algorithm for Gaussian mixture model based speaker identification. IEEE Signal Processing Letters 5(11), 281–284 (1998)CrossRefGoogle Scholar
  20. 20.
    Daubechies, I.: Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41, 909–996 (1988)zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Godfrey, J., Graff, D., Martin, A.: Public databases for speaker recognition and verification. In: Proc. ESCA Workshop Automat. Speaker Recognition, Identification, Verification, pp. 39–42 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Wan-Chen Chen
    • 1
  • Ching-Tang Hsieh
    • 2
  • Eugene Lai
    • 2
  1. 1.Dept. of Electronic EngineeringSt. John’s & St. Mary’s Institute of technologyTaipei
  2. 2.Dept. of Electrical EngineeringTamkang UniversityTaipei

Personalised recommendations