Language and Text-Independent Speaker Recognition System Using Energy Spectrum and MFCCs

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 566)

Abstract

Speaker identification, especially in critical environments, has always been a subject of great interest. In this paper, we present a language and text independent speaker identification algorithm that able to automatically identify a speaker in an audio signal with noise or real environment sound in background. The method is inspired by using a pairing of Energy spectrum and MFCCs audio feature techniques generated from base on Discrete Fourier transform (DFT). After that the audio feature extracted in real time was compared with a Euclidean Distance to measures of different between speakers to obtain the most likely speakers. The Energy spectrum feature is adopted to supplement the MFCC features to yield higher recognition accuracy for speaker identification sound.

The proposed technique is test with 30 different speakers in three languages. The experimental result on speaker identification algorithm using an Energy spectrum and MFCCs features with Euclidean Distance can effectively identify speaker in noise or real environment sound in background with a language and text independent more than 83%. Notably, our approach is not language-specific; it can identify speaker in more than one language.

Keywords

Speaker identification Energy spectrum MFCCs 

References

  1. 1.
    Rosenberg, A. E.: Automatic speaker verification: a review. In: Proceedings of the IEEE, pp. 475–487 (1976)Google Scholar
  2. 2.
    Gómez Vilda, P., Rodellar Biarge, V., Nieto Lluis, V., Muñoz Mulas, C., Mazaira-Fernández, L.M., Martínez Olalla, R.: Characterizing neurological disease from voice quality analysis. Cognit. Comput. 5(4), 399–425 (2013)CrossRefGoogle Scholar
  3. 3.
    Furui, S.: Digital Speech Processing: Synthesis, and Recognition. CRC Press, New York (1989)Google Scholar
  4. 4.
    Hansen, J.H.L.: Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Commun. 20(1–2), 151–173 (1996)CrossRefGoogle Scholar
  5. 5.
    Poignant, J., Besacier, L., Quénot, G.: Unsupervised speaker identification in TV broadcast based on written names. IEEE Trans. Audio Speech Lang. Process. 23(1), 57–68 (2015)Google Scholar
  6. 6.
    Nandwana, M.K., Ziaei, A., Hansen, J.H.L.: Robust unsupervised detection of human screams in noisy acoustic environments. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 161–165, South Brisbane (2015)Google Scholar
  7. 7.
    Almaadeed, N., Aggoun, A., Amira, A.: Speaker identification using multimodal neural networks and wavelet analysis. IET Biom. 4(1), 18–28 (2015)CrossRefGoogle Scholar
  8. 8.
    Zhao, X., Wang, Y., Wang, D.: Robust speaker identification in noisy and reverberant conditions. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3997–4001, Florence (2014)Google Scholar
  9. 9.
    Pathak, M.A., Raj, B.: Privacy preserving speaker verification and identification using gaussian mixture models. IEEE Trans. Audio Speech Lang. Process. 21(2), 397–406 (2013)CrossRefGoogle Scholar
  10. 10.
    Bhardwaj, S., Srivastava, S., Hanmandlu, M., Gupta, J.R.P.: GFM-based methods for speaker identification. IEEE Trans. Cybernet. 43(3), 1047–1058 (2013)CrossRefGoogle Scholar
  11. 11.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. PTR Prentice Hall, Englewood Cliffs (1993)MATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Faculty of Information TechnologyKing Mongkut’s University of Technology North BangkokBangkokThailand
  2. 2.Faculty of Industrial Technology and ManagementKing Mongkut’s University of Technology North BangkokPrachinburiThailand

Personalised recommendations