Skip to main content
Log in

Improving short utterance speaker verification by combining MFCC and Entrocy in Noisy conditions

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Short utterance and background noise represent great challenging for speaker verification due to the mismatch and limited training and/or retrieve data. A remarkable performance using matched training and testing conditions generally could be achieved in automatic speaker verification. However, mismatched noisy and short utterances conditions attend to drop the results significantly. Furthermore, the performance is significantly affected by the features extraction. The most common features in this field of the study are Mel-Frequency Cepstral Coefficients (MFCCs). With a noise presents in the background and short utterances, MFCC performance could not be reliable without a support feature. To address this, a new feature ‘Entrocy’ for accurate and robust speaker verification under limited data and noisy environments is proposed and employed to support MFCC coefficients. Entrocy feature represents the Fourier Transform of the Entropy that calculates the fluctuation of the information in the sound segments over time. The resulting Entrocy features are combined with MFCC functionality to generate a composite feature, which is tested using the Gaussian Mixture Model (GMM) recognition method. The suggested method was conducted out over a range of signal/noise ratios and utterances were truncating into shorts (2, 3, 4, 5, 6, 8, and 10s) for verification. The proposed method has shown strong robustness in the challenging of background noise and limited testing data and they consistently perform better than the well-known MFCC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13.

Similar content being viewed by others

References

  1. Al-Karawi k A (2019) Robustness speaker recognition based on feature space in clean and Noisy condition. Int J Sens Wireless Commun Control 9:1–10

    Article  Google Scholar 

  2. Al-Karawi KA (2020) Mitigate the reverberation effect on the speaker verification performance using different methods. Int J Speech Technol, pp. 1–11

  3. Al-Karawi KA, Li F (2017) Robust speaker verification in reverberant conditions using estimated acoustic parameters—A maximum likelihood estimation and training on the fly approach, in 2017 Seventh International Conference on Innovative Computing Technology (INTECH), pp 52–57

  4. Al-Karawi KA, Al-Noori AH, Li FF, Ritchings T (2015) Automatic speaker recognition system in adverse conditions--implication of noise and reverberation on system performance. Int J Inform and Electron Eng 5:423–427

    Google Scholar 

  5. Chen Y-W, Lin C-J (2006) Combining SVMs with various feature selection strategies, in feature extraction. Springer, pp 315–324

  6. Dehak N, Dehak R, Kenny P, Brümmer N, Ouellet P, Dumouchel P (2009) Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, in Tenth Annual conference of the international speech communication association

  7. Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification Audio, Speech, and Language Processing. IEEE Trans 19:788–798

    Google Scholar 

  8. Fatima N, Zheng TF (2012) Short utterance speaker recognition a research agenda. International Conference on Systems And Informatics (ICSAI2012) IEEE

  9. Furui S (1981) Cepstral analysis technique for automatic speaker verification, Acoustics, Speech and Signal Processing. IEEE Trans on 29:254–272

    Google Scholar 

  10. Hermansky H, Morgan N (Oct 1994) RASTA processing of speech. IEEE Transactions on Speech and Audio Processing 2(4)

  11. Junqua J-C, Reaves B, Mak B (1991) A study of endpoint detection algorithms in adverse conditions: incidence on a DTW and HMM recognizer. In: Second European Conference on Speech Communication and Technology

  12. Kanagasundaram A, Vogt R, Dean DB, Sridharan S, Mason MW (2011) I-vector based speaker recognition on short utterances, in Proceedings of the 12th Annual Conference of the International Speech Communication Association, pp 2341–2344

  13. Kinnunen T, Li H (January 2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Communication Journal 52(1):12–40

  14. Li L, Wang D, Zhang C, Zheng TF (June 2016)Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24(6)

  15. Logan B (2000) Mel frequency cepstral coefficients for music modeling in Ismir, pp 1–11.

  16. Mak M-W, Hsiao R, Mak B (2006) A comparison of various adaptation methods for speaker verification with limited enrollment data. 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

  17. Mohammed DY 2017 Overlapped speech and music segmentation using singular spectrum analysis and random forests," Salford University

  18. Mohammed DY, Duncan PJ, Al-Maathidi MM, Li FF (2015) A system for semantic information extraction from mixed soundtracks deploying MARSYAS framework. 2015 IEEE 13th International Conference on Industrial Informatics (INDIN)

  19. Mohammed K Al-Karawi A, Duncan P, Li FF (2019) Overlapped Music segmentation using a new Effective Feature and Random Forests," International Journal Of artificial intelligence (IN-IA), vol 8

  20. Duraid Y, Al-Karawi KA, Husien IM, Ghulam MA (2020) Mitigate the reverberant effects on speaker recognition via multi-training. In: Applied computing to support industry: innovation and technology. International Conference on Applied Computing to Support Industry: Innovation and Technology ACRIT 2019, Cham, pp 95–109

  21. Nosratighods M, Ambikairajah E, Epps J, Carey MJ (2010) A segment selection technique for speaker verification. Speech Comm 52:753–761

    Article  Google Scholar 

  22. Poddar A, Sahidullah M, Saha G (2017) Speaker verification with short utterances: a review of challenges, trends and opportunities. IET Biometrics 7:91–101

    Article  Google Scholar 

  23. Prince SJ, Elder JH (2007) Probabilistic linear discriminant analysis for inferences about identity. In: 2007 IEEE 11th International Conference on Computer Vision

  24. Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal process 10:19–41

    Article  Google Scholar 

  25. Sadjadi SO, Slaney M, Heck L (2013) MSR identity toolbox v1. 0: a MATLAB toolbox for speaker-recognition research. Speech and Language Processing Technical Committee, Newsletter

  26. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423

    Article  MathSciNet  Google Scholar 

  27. Stewart WJ (2009) Probability, Markov chains, queues, and simulation: the mathematical basis of performance modeling. Princeton University Press

  28. Vogt R, Sridharan S, Mason M (2010) Making confident speaker verification decisions with minimal speech. IEEE Trans Audio Speech Lang Process 18(6)

  29. Vogt R, Sridharan S, Mason M (2009) Making confident speaker verification decisions with minimal speech. IEEE Trans Audio Speech Lang Process 18(6):1182–1192

    Article  Google Scholar 

  30. Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp 7204–7208

  31. Zhao XY, Wang D (2014) Robust speaker identification in Noisy and reverberant conditions. IEEE/ACM Trans Audio Speech Lang Process 22(4):836–845

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khamis A. Al-karawi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-karawi, K.A., Mohammed, D.Y. Improving short utterance speaker verification by combining MFCC and Entrocy in Noisy conditions. Multimed Tools Appl 80, 22231–22249 (2021). https://doi.org/10.1007/s11042-021-10767-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10767-6

Keywords

Navigation