A speaker identification-verification approach for noise-corrupted and improved speech using fusion features and a convolutional neural network

Nisa, Rohun; Baba, Asifa Mehraj

doi:10.1007/s41870-024-01877-z

A speaker identification-verification approach for noise-corrupted and improved speech using fusion features and a convolutional neural network

Original Research
Published: 19 May 2024

(2024)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

14 Accesses
Explore all metrics

Abstract

The degraded quality of the speech input signal has a negative impact on speaker recognition techniques. We address the issues of speaker recognition from noise-corrupted audio signals in the presence of four noise variants, including factory noise, car noise, street traffic noise, and voice babble noise, as well as noise-suppressed enhanced speech. The goal of this research is to create a speaker recognition algorithm that is resistant to a diverse spectrum of speech capture quality, background scenarios, and interferences. In this work, three distinct features, including Mel Frequency Cepstral Coefficients (MFCC), Normalized Pitch Frequency (NPF), and Normalized Phase Cepstral Coefficients (NPCC) are combined. The analysis that MFCC, NPF, and NPCC illustrate distinct features of speech underlies our observation. A Convolutional Neural Network (CNN) is used in our speaker recognition strategy to learn speaker-dependent attributes from fragments of Mel features, normalized pitch features, and phase cepstral features of clean speech, corrupted speech, and enhanced speech. The performance is measured using the ITU-T test signals and compared to previous algorithms at different Signal-to-Noise-Ratios of 0 dB, 5 dB, 10 dB, and 15 dB. For enhanced speech, all three features, MFCC, NPF, and NPCC, provided productive speaker identification and verification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

A text independent speaker identification system using ANN, RNN, and CNN classification technique

Article 02 November 2023

Text-Independent Speaker Recognition Using Deep Learning

A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients

Article 30 August 2021

Data availability

The datasets generated during and/or analysed during the current study are available in the [ITU-T Test Signals for Telecommunication Systems] repository [https://www.itu.int/net/itu-t/sigdb/genaudio/Pseries.htm].

Notes

References

Jayanna HS, Prasanna SM (2009) Analysis, feature extraction, modeling and testing techniques for speaker recognition. IETE Tech Rev 26(3):181–190. https://doi.org/10.4103/0256-4602.50702
Article Google Scholar
Singh N, Khan RA, Shree R (2012) MFCC and prosodic feature extraction techniques: a comparative study. Int J Comput Appl 54(1):9–13
Google Scholar
Hasan M R, Jamil M, Rabbani MG, Rahman MS (2004) Speaker identification using Mel frequency cepstral coefficients. In: ICECE international conference on electrical & computer engineering, December 2004, pp 565–568
Krishnamurthy N, Hansen JH (2009) Babble noise: modeling, analysis, and applications. IEEE Trans Audio Speech Lang Process 17(7):1394–1407. https://doi.org/10.1109/TASL.2009.2015084
Article Google Scholar
Yutai W, Bo L, Xiaoqing J et al (2009) Speaker recognition based on dynamic MFCC parameters. In: IEEE international conference on image analysis and signal processing, April 2009. pp 406–409. https://doi.org/10.1109/IASP.2009.5054638
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Process 10(1–3):19–41. https://doi.org/10.1006/dspr.1999.0361
Article Google Scholar
Campbell WM, Campbell JP, Reynolds DA et al (2006) Support vector machines for speaker and language recognition. Comput Speech Lang 20(2–3):210–229. https://doi.org/10.1016/j.csl.2005.06.003
Article Google Scholar
Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311. https://doi.org/10.1109/LSP.2006.870086
Article Google Scholar
Dehak N, Dehak R, Glass JR et al (2010) Cosine similarity scoring without score normalization techniques. In: Odyssey, June 2010. p 15
Daqrouq K, Tutunji TA (2015) Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Appl Soft Comput 27:231–239. https://doi.org/10.1016/j.asoc.2014.11.016
Article Google Scholar
Ajmera PK, Jadhav DV, Holambe RS (2011) Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram. Pattern Recognit 44(10–11):2749–2759. https://doi.org/10.1016/j.patcog.2011.04.009
Article Google Scholar
Tirumala SS, Shahamiri SR, Garhwal AS, Wang R (2017) Speaker identification features extraction methods: a systematic review. Expert Syst Appl 90:250–271. https://doi.org/10.1016/j.eswa.2017.08.015
Article Google Scholar
Jia Y, Chen X, Yu J et al (2021) Speaker recognition based on characteristic spectrograms and an improved self-organizing feature map neural network. Complex Intell Syst 7:1749–1757. https://doi.org/10.1007/s40747-020-00172-1
Article Google Scholar
Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675. https://doi.org/10.1109/LSP.2015.2420092
Article Google Scholar
Ahmad KS, Thosar AS, Nirmal JH, Pande VS (2015) A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: IEEE eighth international conference on advances in pattern recognition, January 2015. pp 1–6. https://doi.org/10.1109/ICAPR.2015.7050669
Soleymanpour M, Marvi H (2017) Text-independent speaker identification based on selection of the most similar feature vectors. Int J Speech Technol 20:99–108. https://doi.org/10.1007/s10772-016-9385-x
Article Google Scholar
Liu Z, Wu Z, Li T, Li J, Shen C (2018) GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans Industr Inform 14(7):3244–3252. https://doi.org/10.1109/TII.2018.2799928
Article Google Scholar
Ali H, Tran SN, Benetos E et al (2018) Speaker recognition with hybrid features from a deep belief network. Neural Comput Appl 29:13–19. https://doi.org/10.1007/s00521-016-2501-7
Article Google Scholar
Siam AI, El-khobby HA, Elnaby MMA et al (2019) A novel speech enhancement method using Fourier series decomposition and spectral subtraction for robust speaker identification. Wirel Pers Commun 108:1055–1068. https://doi.org/10.1007/s11277-019-06453-4
Article Google Scholar
Kenny P (2010) Bayesian speaker verification with, heavy tailed priors. In: Proceedings Odyssey, 2010
Taherian H, Wang ZQ, Chang J, Wang D (2020) Robust speaker recognition based on single-channel and multi-channel speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 28:1293–1302. https://doi.org/10.1109/TASLP.2020.2986896
Article Google Scholar
El-Moneim SA, Nassar MA, Dessouky MI et al (2020) Text-independent speaker recognition using LSTM-RNN and speech enhancement. Multimedia Tools Appl 79:24013–24028. https://doi.org/10.1007/s11042-019-08293-7
Article Google Scholar
Hourri S, Nikolov NS, Kharroubi J (2021) Convolutional neural network vectors for speaker recognition. Int J Speech Technol 24:389–400. https://doi.org/10.1007/s10772-021-09795-2
Article Google Scholar
Juneja K (2022) Two-level noise robust and block featured PNN model for speaker recognition in real environment. Wirel Pers Commun 125(4):3741–3771. https://doi.org/10.1007/s11277-022-09734-7
Article Google Scholar
Hamidi M, Zealouk O, Satori H et al (2023) COVID-19 assessment using HMM cough recognition system. Int J Inf Technol 15(1):193–201. https://doi.org/10.1007/s41870-022-01120-7
Article Google Scholar
Al-Shakarchy ND, Obayes HK, Abdullah ZN (2023) Person identification based on voice biometric using deep neural network. Int J Inf Technol 15(2):789–795. https://doi.org/10.1007/s41870-022-01142-1
Article Google Scholar
Radha K, Bansal M (2023) Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children. Int J Inf Technol 15(3):1375–1385. https://doi.org/10.1007/s41870-023-01224-8
Article Google Scholar
Chelali FZ (2023) Bimodal fusion of visual and speech data for audiovisual speaker recognition in noisy environment. Int J Inf Technol. https://doi.org/10.1007/s41870-023-01291-x
Article Google Scholar
Nakagawa S, Wang L, Ohtsuka S (2011) Speaker identification and verification by combining MFCC and phase information. IEEE Trans Audio Speech Lang Process 20(4):1085–1095. https://doi.org/10.1109/TASL.2011.2172422
Article Google Scholar
Wu Z, Chng ES, Li H (2012) Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition. In: Thirteenth annual conference of the international speech communication association, 2012
ITU-T P-series recommendations. https://www.itu.int/net/itu-t/sigdb/genaudio/Pseries.htm. Accessed 26 July 2020
Gibiansky A, Arik S, Diamos G et al (2017) Deep voice 2: multi-speaker neural text-to-speech. Adv Neural Inf Process 30
Nisa R, Showkat H, Baba A (2023) The speech signal enhancement approach with multiple sub-frames analysis for complex magnitude and phase spectrum recompense. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2023.120746
Article Google Scholar
Paliwal K, Wójcicki K (2008) Effect of analysis window duration on speech intelligibility. IEEE Signal Process Lett 15:785–788. https://doi.org/10.1109/LSP.2008.2005755
Article Google Scholar
Varga A, Steeneken HJ (1993) Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun 12(3):247–251. https://doi.org/10.1016/0167-6393(93)90095-3
Article Google Scholar

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Islamic University of Science and Technology, Awantipora, Jammu & Kashmir, India, 192122
Rohun Nisa & Asifa Mehraj Baba

Authors

Rohun Nisa
View author publications
You can also search for this author in PubMed Google Scholar
Asifa Mehraj Baba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohun Nisa.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nisa, R., Baba, A.M. A speaker identification-verification approach for noise-corrupted and improved speech using fusion features and a convolutional neural network. Int. j. inf. tecnol. (2024). https://doi.org/10.1007/s41870-024-01877-z

Download citation

Received: 06 January 2024
Accepted: 09 April 2024
Published: 19 May 2024
DOI: https://doi.org/10.1007/s41870-024-01877-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A speaker identification-verification approach for noise-corrupted and improved speech using fusion features and a convolutional neural network

Abstract

Access this article

Similar content being viewed by others

A text independent speaker identification system using ANN, RNN, and CNN classification technique

Text-Independent Speaker Recognition Using Deep Learning

A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A speaker identification-verification approach for noise-corrupted and improved speech using fusion features and a convolutional neural network

Abstract

Access this article

Similar content being viewed by others

A text independent speaker identification system using ANN, RNN, and CNN classification technique

Text-Independent Speaker Recognition Using Deep Learning

A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation