Abstract
This paper proposes robust noise automatic speaker identification (ASI) scheme named MKMFCC–SVM. It based on the Multiple Kernel Weighted Mel Frequency Cepstral Coefficient (MKMFCC) and support vector machine (SVM). Firstly, the MKMFCC is employed for extracting features from degraded audio and it uses multiple kernels such as the exponential and tangential and for MFCC’s weighting. Secondly, the extracted features are then categorized with the SVM classification technique. A comparative study is performed between the proposed MKMFCC–SVM and the MFCC–SVM ASI schemes using the MKMFCC and MFCCs with five schemes for extracting features from telephone-analogous and noisy-like degraded audio signals. Experimental tests prove that the proposed MKMFCC–SVM ASI scheme yields higher identification rate in noise presence or degradation.
Similar content being viewed by others
References
Boujelbene, S. Z., Mezghani, D. B. A., & Ellouze, N. (2010). Improving SVM by modifying kernel functions for speaker identification task. International Journal of Digital Content Technology and its Applications, 4(6, 100–105.
Campbell, W. M., Campbell, J. P., Gleason, T. P., Reynolds, D. A., & Shen, W. (2007). Speaker verification using support vector machines and high-level features. IEEE Transactions on Audio, Speech and Language Processing, 15(7), 2085–2094.
Dharanipragada, S., Yapanel, U. H., & Rao, B. D. (2007) Robust feature extraction for continuous speech recognition using the MVDR spectrum estimation method. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 224–234.
Ding, I.-J., & Yen, C.-T. (2015) Enhancing GMM speaker identification by incorporating SVM speaker verification for intelligent web-based speech applications. Multimedia Tools and Applications, 74, 5131–5140.
Furui, S. (1981) Cepstral Analysis Technique for Automatic Speaker Verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 20(2), 254–272.
Galushkin, A. I. (2007). Neural networks theory. Berlin: Springer.
Gandhiraj, R., Sathidevi, P. S. (2007). Auditory-based wavelet packet filter bank for speech recognition using neural network. In Proceedings of the 15th International Conference on Advanced Computing and Communications, pp. 666–671.
Hayati, M., shirvany, Y. (2007). Artificial neural network approach for short term load forecasting for Illam region. Proceeding of World Academy of Science, Engineering and Technology, 22. ISSN 1307–6884.
Hossain, M., Ahmed, B., Asrafi, M. (2007). A real time speaker identification using artificial neural network. In 10th International Conference on Computer and Information Technology, pp. 1–5.
Huang, C., Song, B., & Zhao, L. (2016). Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering. International Journal of Speech Technology, 19, 805–816.
Li, Z., & Gao, Y. (2016). Acoustic feature extraction method for robust speaker identification. Multimedia Tools and Applications, 75, 7391–7406.
Mellahi, T., & Hamdi, R. (2015). LPC-based formant enhancement method in Kalman filtering for speech enhancement. International Journal of Electronics and Communications, 69(2), 545–554.
Naeeni, B. H., Amindavar, H., & Bakhshi, H. (2010). Blind per tone equalization of multilevel signals using support vector machines for OFDM in wireless communication. International Journal of Electronics and Communications, 64(2), 186–190.
Polur, P. D., & Miller, G. E. (2005). Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden Markov model. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(4), 558–561.
Qian, F., Hu, G., & Yao, X. (2008). Semi-supervised internet network traffic classification using a Gaussian mixture model. International Journal of Electronics and Communications, 62(7), 557–564.
Ramaiah, V. S., & Rao, R. R. (2016). Speaker diarization system using MKMFCC parameterization and WLI-fuzzy clustering. International Journal of Speech Technology, 19, 945–963.
Selva Nidhyananthan, S., Shantha Selva Kumari, R., & Senthur Selvi, T. (2016). Noise robust speaker identification using RASTA-MFCC Feature with quadrilateral filter bank structure. Wireless Personal Communications, 91, 1321–1333.
Shuling, L., & Wang C. (2009). Nonspecific speech recognition method based on composite LVQ1 and LVQ2 network. In Chinese Control and Decision Conference (CCDC), pp. 2304–2388.
Xu, L., & Yang, Z. (2016). Speaker identification based on state space model. International Journal of Speech Technology, 19, 404–414.
You, C. H., Lee, K. A., & Li, H. (2010). GMM-SVM kernel with a Bhattacharyya-based distance for speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1300–1312.
Zergat, K. Y., & Amrouche, A. (2014). New scheme based on GMM-PCA-SVM modeling for automatic speaker recognition. International Journal of Speech Technology, 17, 373–381.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Faragallah, O.S. Robust noise MKMFCC–SVM automatic speaker identification. Int J Speech Technol 21, 185–192 (2018). https://doi.org/10.1007/s10772-018-9494-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-018-9494-9