International Journal of Speech Technology

, Volume 17, Issue 4, pp 373–381 | Cite as

New scheme based on GMM-PCA-SVM modelling for automatic speaker recognition



Most of the existing speaker recognition systems are based on the basic GMM, the state of the art GMM-UBM, the SVM or more recently the GMM-SVM modeling. In this paper, a new scheme for Automatic Speaker Recognition (ASR), namely GMM-PCA-SVM, is presented. Dimensionality reduction using Principal Component Analysis (PCA) technique, which was previously applied in the front-end process, is now incorporated in the core of the GMM-SVM modeling part, in order to reduce the size of the adapted means vectors issued from the Universal Background Model (UBM). A Comparative study, using Mel Frequency Cepstral Coefficients (MFCC) with Cepstral Mean Subtraction (CMS) extracted from the TIMIT database is performed for speaker recognition in clean and noisy environments. It is shown that the proposed scheme is a promising way for the ASR task. In fact, the recognition performances using GMM-PCA-SVM proposed method is significantly improved compared to the conventional SVM or GMM-SVM based systems.


Speaker recognition  Dimentionality reduction Support vector machine (SVM) PCA Gaussian supervector (GMM-SVM)  GMM-PCA-SVM Noisy envirnments 


  1. Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 1–47.CrossRefGoogle Scholar
  2. Campbell, W., Sturim, D., Reynolds, D. A., & Solomonoff, A. (2006). SVM based speaker verification using a GMM supervector kernel and Nap variability compensation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toulouse, France (pp. 97–100).Google Scholar
  3. Campbell, J. P, Jr. (1997). Speaker recognition: a tutorial. In Pro. IEEE, 85(9), 1437–1462.Google Scholar
  4. Chitturi, R., & Hansen, J. H. L. (2007). Multi-stream dialect classification using SVM-GMM hybrid classifiers. In IEEE Workshop on Automatic Speech Recognition & Understanding, Kyoto, Japan (pp. 431–436).Google Scholar
  5. Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., et al. (1993). TIMIT acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.Google Scholar
  6. Hanilci, C., & Ertas, F. (2011). VQ-UBM based speaker verification through dimension reduction using local PCA, In 19th European Signal Processing conference, Spain (pp. 1303–1306).Google Scholar
  7. Harrag, A., Mohamadi, T., & Harrag N. (2011). LDA fusing of acoustic and prosodic features: application to speaker recognition. In Colloquium on Humanities, Science and Engineering Research, Penang (pp. 245–248).Google Scholar
  8. Izquierdo-Verdiguier, E., Gomez-Chova, L., Bruzzone, L., & Camps-Valls, G. (2014). Semisupervised kernel feature extraction for remote sensing image analysis. IEEE transactions on geoscience and remote sensing, PP(99), 1–12.Google Scholar
  9. Jiang, J., Wu, Z., Xu, M., Jia, J., & Cai, L. (2013). Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition, In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Kaohsiung, China (pp. 1–4).Google Scholar
  10. Jokic, I., Jokic, S., Gnjatovic, M., Delic, V., & Peric, Z. (2012). Influence of the number of principal components used to the automatic speaker recognition accuracy. Electronics & Electrical Engineering, 123, 83–86.Google Scholar
  11. Jolliffe, I. T. (2010). Principal component analysis (2nd ed.). New York, NY: Springer-Verlag.Google Scholar
  12. Karam, Z. N., & Campbell, W. M. (2008). A multi-class MLLR kernel for SVM speaker recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, USA (pp. 4117–4120).Google Scholar
  13. Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.CrossRefGoogle Scholar
  14. Kresimir delac, K., Grgic, M., & liatsis, P. (2005). Appearance based statistical methods for face recognition. In 47th international symposium EL-MAR, Zadar, Croatia (pp. 151–158).Google Scholar
  15. Kuncheva, L. I., & Faithfull, W. J. (2014). PCA feature extraction for change detection in multidimensional unlabeled data. IEEE transactions on neural networks and learning systems, 25(1), 69–80.Google Scholar
  16. Lee, K. Y. (2004). Local fuzzy PCA based GMM with dimension reduction on speaker identification. Pattern Recognition Letters, 25, 1811–1817.CrossRefGoogle Scholar
  17. Li, H., & Dong, Y. (2013). EigenVoice used in speaker recognition with a few training samples. Advanced Materials Research, 823, 618–621.CrossRefGoogle Scholar
  18. Malarvizhi, A., & Sivasarathadevi, K. (2013). Performance analysis of HDM and PCA, ICA In teeth image recognition, In Proceedings of International Conference on Optical Imaging Sensor and Security, Coimbatore, India (pp. 1–5).Google Scholar
  19. Minkyung, K., Eunyoung, K., Changwoo, S., & Sungchae, J. (2010). Speaker verification and identification using principal component analysis based on global eigenvector matrix. Hybrid Artificial Intelligence Systems, 6076, 278–285.CrossRefGoogle Scholar
  20. Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41. Google Scholar
  21. Vapnik, V. (1998). Statistical learning theory. New York: John Wiley.MATHGoogle Scholar
  22. Wan, V., & Renals, S. (2003). SVMSVM: support vector machine speaker verification methodology. In IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (ICASSP), Hong Kong, China (pp. 221–224).Google Scholar
  23. Yun, L., & Hansen, J.H.L. (2009). Factor analysis-based information integration for Arabic dialect identification. In IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taïwan (pp. 4337–4340).Google Scholar
  24. Yun, L., & Hansen, J. H. L. (2011). Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese. Audio, Speech, and Language Processing, IEEE Transactions on Biometrics Compendium, 19(1), 85–96.CrossRefGoogle Scholar
  25. Zhang, C., & Zheng, T.F. (2013). A fishervoice based feature fusion method for short utterance speaker recognition. In 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, China (pp. 165–169).Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Kawthar Yasmine Zergat
    • 1
  • Abderrahmane Amrouche
    • 1
  1. 1.Speech Com. & Signal Proc. Lab.-LCPTS, Faculty of Electronics and Computer SciencesUSTHBBab EzzouarAlgeria

Personalised recommendations