Improving short utterance speaker verification by combining MFCC and Entrocy in Noisy conditions

Al-karawi, Khamis A.; Mohammed, Duraid Y.

doi:10.1007/s11042-021-10767-6

Improving short utterance speaker verification by combining MFCC and Entrocy in Noisy conditions

Published: 25 March 2021

Volume 80, pages 22231–22249, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

439 Accesses
16 Citations
Explore all metrics

Abstract

Short utterance and background noise represent great challenging for speaker verification due to the mismatch and limited training and/or retrieve data. A remarkable performance using matched training and testing conditions generally could be achieved in automatic speaker verification. However, mismatched noisy and short utterances conditions attend to drop the results significantly. Furthermore, the performance is significantly affected by the features extraction. The most common features in this field of the study are Mel-Frequency Cepstral Coefficients (MFCCs). With a noise presents in the background and short utterances, MFCC performance could not be reliable without a support feature. To address this, a new feature ‘Entrocy’ for accurate and robust speaker verification under limited data and noisy environments is proposed and employed to support MFCC coefficients. Entrocy feature represents the Fourier Transform of the Entropy that calculates the fluctuation of the information in the sound segments over time. The resulting Entrocy features are combined with MFCC functionality to generate a composite feature, which is tested using the Gaussian Mixture Model (GMM) recognition method. The suggested method was conducted out over a range of signal/noise ratios and utterances were truncating into shorts (2, 3, 4, 5, 6, 8, and 10s) for verification. The proposed method has shown strong robustness in the challenging of background noise and limited testing data and they consistently perform better than the well-known MFCC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using combined features to improve speaker verification in the face of limited reverberant data

Article 01 September 2023

Robust Speaker Recognition Using Improved GFCC and Adaptive Feature Selection

Channel Robust MFCCs for Continuous Speech Speaker Recognition

References

Al-Karawi k A (2019) Robustness speaker recognition based on feature space in clean and Noisy condition. Int J Sens Wireless Commun Control 9:1–10
Article Google Scholar
Al-Karawi KA (2020) Mitigate the reverberation effect on the speaker verification performance using different methods. Int J Speech Technol, pp. 1–11
Al-Karawi KA, Li F (2017) Robust speaker verification in reverberant conditions using estimated acoustic parameters—A maximum likelihood estimation and training on the fly approach, in 2017 Seventh International Conference on Innovative Computing Technology (INTECH), pp 52–57
Al-Karawi KA, Al-Noori AH, Li FF, Ritchings T (2015) Automatic speaker recognition system in adverse conditions--implication of noise and reverberation on system performance. Int J Inform and Electron Eng 5:423–427
Google Scholar
Chen Y-W, Lin C-J (2006) Combining SVMs with various feature selection strategies, in feature extraction. Springer, pp 315–324
Dehak N, Dehak R, Kenny P, Brümmer N, Ouellet P, Dumouchel P (2009) Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, in Tenth Annual conference of the international speech communication association
Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification Audio, Speech, and Language Processing. IEEE Trans 19:788–798
Google Scholar
Fatima N, Zheng TF (2012) Short utterance speaker recognition a research agenda. International Conference on Systems And Informatics (ICSAI2012) IEEE
Furui S (1981) Cepstral analysis technique for automatic speaker verification, Acoustics, Speech and Signal Processing. IEEE Trans on 29:254–272
Google Scholar
Hermansky H, Morgan N (Oct 1994) RASTA processing of speech. IEEE Transactions on Speech and Audio Processing 2(4)
Junqua J-C, Reaves B, Mak B (1991) A study of endpoint detection algorithms in adverse conditions: incidence on a DTW and HMM recognizer. In: Second European Conference on Speech Communication and Technology
Kanagasundaram A, Vogt R, Dean DB, Sridharan S, Mason MW (2011) I-vector based speaker recognition on short utterances, in Proceedings of the 12th Annual Conference of the International Speech Communication Association, pp 2341–2344
Kinnunen T, Li H (January 2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Communication Journal 52(1):12–40
Li L, Wang D, Zhang C, Zheng TF (June 2016)Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24(6)
Logan B (2000) Mel frequency cepstral coefficients for music modeling in Ismir, pp 1–11.
Mak M-W, Hsiao R, Mak B (2006) A comparison of various adaptation methods for speaker verification with limited enrollment data. 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings
Mohammed DY 2017 Overlapped speech and music segmentation using singular spectrum analysis and random forests," Salford University
Mohammed DY, Duncan PJ, Al-Maathidi MM, Li FF (2015) A system for semantic information extraction from mixed soundtracks deploying MARSYAS framework. 2015 IEEE 13th International Conference on Industrial Informatics (INDIN)
Mohammed K Al-Karawi A, Duncan P, Li FF (2019) Overlapped Music segmentation using a new Effective Feature and Random Forests," International Journal Of artificial intelligence (IN-IA), vol 8
Duraid Y, Al-Karawi KA, Husien IM, Ghulam MA (2020) Mitigate the reverberant effects on speaker recognition via multi-training. In: Applied computing to support industry: innovation and technology. International Conference on Applied Computing to Support Industry: Innovation and Technology ACRIT 2019, Cham, pp 95–109
Nosratighods M, Ambikairajah E, Epps J, Carey MJ (2010) A segment selection technique for speaker verification. Speech Comm 52:753–761
Article Google Scholar
Poddar A, Sahidullah M, Saha G (2017) Speaker verification with short utterances: a review of challenges, trends and opportunities. IET Biometrics 7:91–101
Article Google Scholar
Prince SJ, Elder JH (2007) Probabilistic linear discriminant analysis for inferences about identity. In: 2007 IEEE 11th International Conference on Computer Vision
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal process 10:19–41
Article Google Scholar
Sadjadi SO, Slaney M, Heck L (2013) MSR identity toolbox v1. 0: a MATLAB toolbox for speaker-recognition research. Speech and Language Processing Technical Committee, Newsletter
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423
Article MathSciNet Google Scholar
Stewart WJ (2009) Probability, Markov chains, queues, and simulation: the mathematical basis of performance modeling. Princeton University Press
Vogt R, Sridharan S, Mason M (2010) Making confident speaker verification decisions with minimal speech. IEEE Trans Audio Speech Lang Process 18(6)
Vogt R, Sridharan S, Mason M (2009) Making confident speaker verification decisions with minimal speech. IEEE Trans Audio Speech Lang Process 18(6):1182–1192
Article Google Scholar
Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp 7204–7208
Zhao XY, Wang D (2014) Robust speaker identification in Noisy and reverberant conditions. IEEE/ACM Trans Audio Speech Lang Process 22(4):836–845

Download references

Author information

Authors and Affiliations

Diyala University, Baqubah, Diyala, Iraq
Khamis A. Al-karawi
School of education for women, Al-Iraqia University, Baghdad, Iraq
Duraid Y. Mohammed

Authors

Khamis A. Al-karawi
View author publications
You can also search for this author in PubMed Google Scholar
Duraid Y. Mohammed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khamis A. Al-karawi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-karawi, K.A., Mohammed, D.Y. Improving short utterance speaker verification by combining MFCC and Entrocy in Noisy conditions. Multimed Tools Appl 80, 22231–22249 (2021). https://doi.org/10.1007/s11042-021-10767-6

Download citation

Received: 18 May 2020
Revised: 22 December 2020
Accepted: 25 February 2021
Published: 25 March 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11042-021-10767-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving short utterance speaker verification by combining MFCC and Entrocy in Noisy conditions

Abstract

Access this article

Similar content being viewed by others

Using combined features to improve speaker verification in the face of limited reverberant data

Robust Speaker Recognition Using Improved GFCC and Adaptive Feature Selection

Channel Robust MFCCs for Continuous Speech Speaker Recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving short utterance speaker verification by combining MFCC and Entrocy in Noisy conditions

Abstract

Access this article

Similar content being viewed by others

Using combined features to improve speaker verification in the face of limited reverberant data

Robust Speaker Recognition Using Improved GFCC and Adaptive Feature Selection

Channel Robust MFCCs for Continuous Speech Speaker Recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation