Skip to main content
Log in

New scheme based on GMM-PCA-SVM modelling for automatic speaker recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Most of the existing speaker recognition systems are based on the basic GMM, the state of the art GMM-UBM, the SVM or more recently the GMM-SVM modeling. In this paper, a new scheme for Automatic Speaker Recognition (ASR), namely GMM-PCA-SVM, is presented. Dimensionality reduction using Principal Component Analysis (PCA) technique, which was previously applied in the front-end process, is now incorporated in the core of the GMM-SVM modeling part, in order to reduce the size of the adapted means vectors issued from the Universal Background Model (UBM). A Comparative study, using Mel Frequency Cepstral Coefficients (MFCC) with Cepstral Mean Subtraction (CMS) extracted from the TIMIT database is performed for speaker recognition in clean and noisy environments. It is shown that the proposed scheme is a promising way for the ASR task. In fact, the recognition performances using GMM-PCA-SVM proposed method is significantly improved compared to the conventional SVM or GMM-SVM based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 1–47.

    Article  Google Scholar 

  • Campbell, W., Sturim, D., Reynolds, D. A., & Solomonoff, A. (2006). SVM based speaker verification using a GMM supervector kernel and Nap variability compensation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toulouse, France (pp. 97–100).

  • Campbell, J. P, Jr. (1997). Speaker recognition: a tutorial. In Pro. IEEE, 85(9), 1437–1462.

  • Chitturi, R., & Hansen, J. H. L. (2007). Multi-stream dialect classification using SVM-GMM hybrid classifiers. In IEEE Workshop on Automatic Speech Recognition & Understanding, Kyoto, Japan (pp. 431–436).

  • Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., et al. (1993). TIMIT acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.

    Google Scholar 

  • Hanilci, C., & Ertas, F. (2011). VQ-UBM based speaker verification through dimension reduction using local PCA, In 19th European Signal Processing conference, Spain (pp. 1303–1306).

  • Harrag, A., Mohamadi, T., & Harrag N. (2011). LDA fusing of acoustic and prosodic features: application to speaker recognition. In Colloquium on Humanities, Science and Engineering Research, Penang (pp. 245–248).

  • Izquierdo-Verdiguier, E., Gomez-Chova, L., Bruzzone, L., & Camps-Valls, G. (2014). Semisupervised kernel feature extraction for remote sensing image analysis. IEEE transactions on geoscience and remote sensing, PP(99), 1–12.

  • Jiang, J., Wu, Z., Xu, M., Jia, J., & Cai, L. (2013). Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition, In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Kaohsiung, China (pp. 1–4).

  • Jokic, I., Jokic, S., Gnjatovic, M., Delic, V., & Peric, Z. (2012). Influence of the number of principal components used to the automatic speaker recognition accuracy. Electronics & Electrical Engineering, 123, 83–86.

    Google Scholar 

  • Jolliffe, I. T. (2010). Principal component analysis (2nd ed.). New York, NY: Springer-Verlag.

    Google Scholar 

  • Karam, Z. N., & Campbell, W. M. (2008). A multi-class MLLR kernel for SVM speaker recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, USA (pp. 4117–4120).

  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.

    Article  Google Scholar 

  • Kresimir delac, K., Grgic, M., & liatsis, P. (2005). Appearance based statistical methods for face recognition. In 47th international symposium EL-MAR, Zadar, Croatia (pp. 151–158).

  • Kuncheva, L. I., & Faithfull, W. J. (2014). PCA feature extraction for change detection in multidimensional unlabeled data. IEEE transactions on neural networks and learning systems, 25(1), 69–80.

  • Lee, K. Y. (2004). Local fuzzy PCA based GMM with dimension reduction on speaker identification. Pattern Recognition Letters, 25, 1811–1817.

    Article  Google Scholar 

  • Li, H., & Dong, Y. (2013). EigenVoice used in speaker recognition with a few training samples. Advanced Materials Research, 823, 618–621.

    Article  Google Scholar 

  • Malarvizhi, A., & Sivasarathadevi, K. (2013). Performance analysis of HDM and PCA, ICA In teeth image recognition, In Proceedings of International Conference on Optical Imaging Sensor and Security, Coimbatore, India (pp. 1–5).

  • Minkyung, K., Eunyoung, K., Changwoo, S., & Sungchae, J. (2010). Speaker verification and identification using principal component analysis based on global eigenvector matrix. Hybrid Artificial Intelligence Systems, 6076, 278–285.

    Article  Google Scholar 

  • Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.

  • Vapnik, V. (1998). Statistical learning theory. New York: John Wiley.

    MATH  Google Scholar 

  • Wan, V., & Renals, S. (2003). SVMSVM: support vector machine speaker verification methodology. In IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (ICASSP), Hong Kong, China (pp. 221–224).

  • Yun, L., & Hansen, J.H.L. (2009). Factor analysis-based information integration for Arabic dialect identification. In IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taïwan (pp. 4337–4340).

  • Yun, L., & Hansen, J. H. L. (2011). Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese. Audio, Speech, and Language Processing, IEEE Transactions on Biometrics Compendium, 19(1), 85–96.

    Article  Google Scholar 

  • Zhang, C., & Zheng, T.F. (2013). A fishervoice based feature fusion method for short utterance speaker recognition. In 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, China (pp. 165–169).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kawthar Yasmine Zergat.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zergat, K.Y., Amrouche, A. New scheme based on GMM-PCA-SVM modelling for automatic speaker recognition. Int J Speech Technol 17, 373–381 (2014). https://doi.org/10.1007/s10772-014-9235-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-014-9235-7

Keywords

Navigation