IC2IT 2017: Recent Advances in Information and Communication Technology 2017 pp 349-357 | Cite as
Language and Text-Independent Speaker Recognition System Using Energy Spectrum and MFCCs
Abstract
Speaker identification, especially in critical environments, has always been a subject of great interest. In this paper, we present a language and text independent speaker identification algorithm that able to automatically identify a speaker in an audio signal with noise or real environment sound in background. The method is inspired by using a pairing of Energy spectrum and MFCCs audio feature techniques generated from base on Discrete Fourier transform (DFT). After that the audio feature extracted in real time was compared with a Euclidean Distance to measures of different between speakers to obtain the most likely speakers. The Energy spectrum feature is adopted to supplement the MFCC features to yield higher recognition accuracy for speaker identification sound.
The proposed technique is test with 30 different speakers in three languages. The experimental result on speaker identification algorithm using an Energy spectrum and MFCCs features with Euclidean Distance can effectively identify speaker in noise or real environment sound in background with a language and text independent more than 83%. Notably, our approach is not language-specific; it can identify speaker in more than one language.
Keywords
Speaker identification Energy spectrum MFCCsReferences
- 1.Rosenberg, A. E.: Automatic speaker verification: a review. In: Proceedings of the IEEE, pp. 475–487 (1976)Google Scholar
- 2.Gómez Vilda, P., Rodellar Biarge, V., Nieto Lluis, V., Muñoz Mulas, C., Mazaira-Fernández, L.M., Martínez Olalla, R.: Characterizing neurological disease from voice quality analysis. Cognit. Comput. 5(4), 399–425 (2013)CrossRefGoogle Scholar
- 3.Furui, S.: Digital Speech Processing: Synthesis, and Recognition. CRC Press, New York (1989)Google Scholar
- 4.Hansen, J.H.L.: Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Commun. 20(1–2), 151–173 (1996)CrossRefGoogle Scholar
- 5.Poignant, J., Besacier, L., Quénot, G.: Unsupervised speaker identification in TV broadcast based on written names. IEEE Trans. Audio Speech Lang. Process. 23(1), 57–68 (2015)Google Scholar
- 6.Nandwana, M.K., Ziaei, A., Hansen, J.H.L.: Robust unsupervised detection of human screams in noisy acoustic environments. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 161–165, South Brisbane (2015)Google Scholar
- 7.Almaadeed, N., Aggoun, A., Amira, A.: Speaker identification using multimodal neural networks and wavelet analysis. IET Biom. 4(1), 18–28 (2015)CrossRefGoogle Scholar
- 8.Zhao, X., Wang, Y., Wang, D.: Robust speaker identification in noisy and reverberant conditions. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3997–4001, Florence (2014)Google Scholar
- 9.Pathak, M.A., Raj, B.: Privacy preserving speaker verification and identification using gaussian mixture models. IEEE Trans. Audio Speech Lang. Process. 21(2), 397–406 (2013)CrossRefGoogle Scholar
- 10.Bhardwaj, S., Srivastava, S., Hanmandlu, M., Gupta, J.R.P.: GFM-based methods for speaker identification. IEEE Trans. Cybernet. 43(3), 1047–1058 (2013)CrossRefGoogle Scholar
- 11.Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. PTR Prentice Hall, Englewood Cliffs (1993)MATHGoogle Scholar