Abstract
Short utterance and background noise represent great challenging for speaker verification due to the mismatch and limited training and/or retrieve data. A remarkable performance using matched training and testing conditions generally could be achieved in automatic speaker verification. However, mismatched noisy and short utterances conditions attend to drop the results significantly. Furthermore, the performance is significantly affected by the features extraction. The most common features in this field of the study are Mel-Frequency Cepstral Coefficients (MFCCs). With a noise presents in the background and short utterances, MFCC performance could not be reliable without a support feature. To address this, a new feature ‘Entrocy’ for accurate and robust speaker verification under limited data and noisy environments is proposed and employed to support MFCC coefficients. Entrocy feature represents the Fourier Transform of the Entropy that calculates the fluctuation of the information in the sound segments over time. The resulting Entrocy features are combined with MFCC functionality to generate a composite feature, which is tested using the Gaussian Mixture Model (GMM) recognition method. The suggested method was conducted out over a range of signal/noise ratios and utterances were truncating into shorts (2, 3, 4, 5, 6, 8, and 10s) for verification. The proposed method has shown strong robustness in the challenging of background noise and limited testing data and they consistently perform better than the well-known MFCC.
Similar content being viewed by others
References
Al-Karawi k A (2019) Robustness speaker recognition based on feature space in clean and Noisy condition. Int J Sens Wireless Commun Control 9:1–10
Al-Karawi KA (2020) Mitigate the reverberation effect on the speaker verification performance using different methods. Int J Speech Technol, pp. 1–11
Al-Karawi KA, Li F (2017) Robust speaker verification in reverberant conditions using estimated acoustic parameters—A maximum likelihood estimation and training on the fly approach, in 2017 Seventh International Conference on Innovative Computing Technology (INTECH), pp 52–57
Al-Karawi KA, Al-Noori AH, Li FF, Ritchings T (2015) Automatic speaker recognition system in adverse conditions--implication of noise and reverberation on system performance. Int J Inform and Electron Eng 5:423–427
Chen Y-W, Lin C-J (2006) Combining SVMs with various feature selection strategies, in feature extraction. Springer, pp 315–324
Dehak N, Dehak R, Kenny P, Brümmer N, Ouellet P, Dumouchel P (2009) Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, in Tenth Annual conference of the international speech communication association
Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification Audio, Speech, and Language Processing. IEEE Trans 19:788–798
Fatima N, Zheng TF (2012) Short utterance speaker recognition a research agenda. International Conference on Systems And Informatics (ICSAI2012) IEEE
Furui S (1981) Cepstral analysis technique for automatic speaker verification, Acoustics, Speech and Signal Processing. IEEE Trans on 29:254–272
Hermansky H, Morgan N (Oct 1994) RASTA processing of speech. IEEE Transactions on Speech and Audio Processing 2(4)
Junqua J-C, Reaves B, Mak B (1991) A study of endpoint detection algorithms in adverse conditions: incidence on a DTW and HMM recognizer. In: Second European Conference on Speech Communication and Technology
Kanagasundaram A, Vogt R, Dean DB, Sridharan S, Mason MW (2011) I-vector based speaker recognition on short utterances, in Proceedings of the 12th Annual Conference of the International Speech Communication Association, pp 2341–2344
Kinnunen T, Li H (January 2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Communication Journal 52(1):12–40
Li L, Wang D, Zhang C, Zheng TF (June 2016)Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24(6)
Logan B (2000) Mel frequency cepstral coefficients for music modeling in Ismir, pp 1–11.
Mak M-W, Hsiao R, Mak B (2006) A comparison of various adaptation methods for speaker verification with limited enrollment data. 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings
Mohammed DY 2017 Overlapped speech and music segmentation using singular spectrum analysis and random forests," Salford University
Mohammed DY, Duncan PJ, Al-Maathidi MM, Li FF (2015) A system for semantic information extraction from mixed soundtracks deploying MARSYAS framework. 2015 IEEE 13th International Conference on Industrial Informatics (INDIN)
Mohammed K Al-Karawi A, Duncan P, Li FF (2019) Overlapped Music segmentation using a new Effective Feature and Random Forests," International Journal Of artificial intelligence (IN-IA), vol 8
Duraid Y, Al-Karawi KA, Husien IM, Ghulam MA (2020) Mitigate the reverberant effects on speaker recognition via multi-training. In: Applied computing to support industry: innovation and technology. International Conference on Applied Computing to Support Industry: Innovation and Technology ACRIT 2019, Cham, pp 95–109
Nosratighods M, Ambikairajah E, Epps J, Carey MJ (2010) A segment selection technique for speaker verification. Speech Comm 52:753–761
Poddar A, Sahidullah M, Saha G (2017) Speaker verification with short utterances: a review of challenges, trends and opportunities. IET Biometrics 7:91–101
Prince SJ, Elder JH (2007) Probabilistic linear discriminant analysis for inferences about identity. In: 2007 IEEE 11th International Conference on Computer Vision
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal process 10:19–41
Sadjadi SO, Slaney M, Heck L (2013) MSR identity toolbox v1. 0: a MATLAB toolbox for speaker-recognition research. Speech and Language Processing Technical Committee, Newsletter
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423
Stewart WJ (2009) Probability, Markov chains, queues, and simulation: the mathematical basis of performance modeling. Princeton University Press
Vogt R, Sridharan S, Mason M (2010) Making confident speaker verification decisions with minimal speech. IEEE Trans Audio Speech Lang Process 18(6)
Vogt R, Sridharan S, Mason M (2009) Making confident speaker verification decisions with minimal speech. IEEE Trans Audio Speech Lang Process 18(6):1182–1192
Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp 7204–7208
Zhao XY, Wang D (2014) Robust speaker identification in Noisy and reverberant conditions. IEEE/ACM Trans Audio Speech Lang Process 22(4):836–845
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Al-karawi, K.A., Mohammed, D.Y. Improving short utterance speaker verification by combining MFCC and Entrocy in Noisy conditions. Multimed Tools Appl 80, 22231–22249 (2021). https://doi.org/10.1007/s11042-021-10767-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10767-6