An efficient speaker identification framework based on Mask R-CNN classifier parameter optimized using hosted cuckoo optimization (HCO)

Gaurav; Bhardwaj, Saurabh; Agarwal, Ravinder

doi:10.1007/s12652-022-03828-7

An efficient speaker identification framework based on Mask R-CNN classifier parameter optimized using hosted cuckoo optimization (HCO)

Original Research
Published: 05 April 2022

Volume 14, pages 13613–13625, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Gaurav¹,
Saurabh Bhardwaj¹ &
Ravinder Agarwal¹

369 Accesses
3 Citations
Explore all metrics

Abstract

Speaker identification is the method of human voice identifying with the help of artificial intelligence (AI) method. The technology of speaker identification is broadly utilized in voice recognition, secure, surveillance, electronic voice eavesdropping, and the verification of identity. In the existing methods, it does not provide the sufficient accuracy and robustness of the speech signal. To overcome these issues, an efficient Speaker Identification framework based on Mask region based convolutional neural network (Mask R-CNN) classifier parameter optimized using Hosted Cuckoo Optimization (HCO) is proposed in this manuscript. The objective of the proposed method is “to increase the accuracy and to improve the robustness of the signal”. Initially, the input speech signals are taken from the real time dataset. From the input speech signal, there are four types of the features are extracted, they are Mel Frequency Differential Power Cepstral Coefficients (MFDPCC), Gamma tone Frequency Cepstral Coefficients (GFCC), Power Normalized Cepstral Coefficients (PNCC) and Spectral entropy for improving the robustness of the signal. Then, the speaker ID is classified by using the Mask R-CNN classifier. Similarly, the Mask R-CNN classifier parameters are optimized by using the HCO algorithm. This method is relevant in the real time application, such as telephone banking and the fax mailing. The simulation is executed in MATLAB. The simulation results shows that the proposed Mask-R-CNN-HCO method attains accuracy of 24.16%, 32.18%, 28.43%, 36.4%, 33.26%, Sensitivity of 37.68%, 33.80%, 24.16%, 32.18%, 28.43%, Precision of 35.88%, 24.16%, 32.18%, 28.43%, 26.77% higher than the existing methods, such as Automatic Classification of speaker identification using K-Nearest Neighbors algorithm (KNN), classification of speaker identification using multiclass support vector machine(MCSVM), classification of speaker identification using Gaussian Mixture Model–Convolutional Neural Network (GMMCNN) classifier, classification of speaker identification using Deep neural network (DNN) and classification of speaker identification using Gaussian Mixture Model–deep Neural Network (GMMDNN) classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Biometrics recognition using deep learning: a survey

Article 13 January 2023

Data availability

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Code accessibility

Not applicable.

References

Abd El-Moneim S, Nassar MA, Dessouky MI, Ismail NA, El-Fishawy AS, Abd El-Samie FE (2020) Text-independent speaker recognition using LSTM-RNN and speech enhancement. Multimed Tools Appl 79(33):24013–24028
Article Google Scholar
Bisio I, Garibotto C, Grattarola A, Lavagetto F, Sciarrone A (2018) Smart and robust speaker recognition for context-aware in-vehicle applications. IEEE Trans Veh Technol 67(9):8808–8821
Article Google Scholar
Chen C, Wang W, He Y, Han J (2019) A bilevel framework for joint optimization of session compensation and classification for speaker identification. Digit Signal Process 89:104–115
Article MathSciNet Google Scholar
Devi KJ, Thongam K (2019) Automatic speaker recognition with enhanced swallow swarm optimization and ensemble classification model from speech signals. J Ambient Intell Humaniz Comput 1–4
El Ayadi M, Hassan AK, Abdel-Naby A, Elgendy OA (2017) Text-independent speaker identification using robust statistics estimation. Speech Commun 92:52–63
Article Google Scholar
Geravanchizadeh M, Forouhandeh E, Bashirpour M (2021) Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition. EURASIP J Audio Speech Music Process 1:1–9
Google Scholar
Greenberg CS, Mason LP, Sadjadi SO, Reynolds DA (2020) Two decades of speaker recognition evaluation at the national institute of standards and technology. Comput Speech Lang 60:101032
Article Google Scholar
Han JH, Bae KM, Hong SK, Park H, Kwak JH, Wang HS, Joe DJ, Park JH, Jung YH, Hur S, Yoo CD (2018) Machine learning-based self-powered acoustic sensor for speaker recognition. Nano Energy 53:658–665
Article Google Scholar
Hourri S, Kharroubi J (2020) A deep learning approach for speaker recognition. Int J Speech Technol 23(1):123–131
Article Google Scholar
Hourri S, Nikolov NS, Kharroubi J (2021) Convolutional neural network vectors for speaker recognition. Int J Speech Technol 24(2):389–400
Article Google Scholar
Jagdale SM, Shinde AA, Chitode JS (2020) Robust speaker recognition based on low-level-and prosodic-level-features. In: Advances in data sciences, security and applications. Springer, Singapore, pp 267–274
Jahangir R, Teh YW, Memon NA, Mujtaba G, Zareei M, Ishtiaq U, Akhtar MZ, Ali I (2020) Text-independent speaker identification through feature fusion and deep neural network. IEEE Access 8:32187–32202
Article Google Scholar
Jessen M, Bortlík J, Schwarz P, Solewicz YA (2019) Evaluation of Phonexia automatic speaker recognition software under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01). Speech Commun 111:22–28
Article Google Scholar
Kumaran U, Rammohan SR, Nagarajan SM, Prathik A (2021) Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. Int J Speech Technol 24(2):303–314
Article Google Scholar
Kwon S (2021) Att-Net: enhanced emotion recognition system using lightweight self-attention module. Appl Soft Comput 102:107101
Article Google Scholar
Madhavi MC, Patil HA (2019) Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection. Comput Speech Lang 58:175–202
Article Google Scholar
Mellal MA, Frik A, Boutiche R (2021) Reliability optimization of power plant safety system using grey wolf optimizer and shuffled frog-leaping algorithm. In: Nature-inspired computing paradigms in systems. Academic Press, pp 1–13
Mythili S, Thiyagarajah K, Rajesh P, Shajin FH (2020) Ideal position and size selection of unified power flow controllers (UPFCs) to upgrade the dynamic stability of systems: an antlion optimiser and invasive weed optimisation algorithm. HKIE Trans 27(1):25–37
Article Google Scholar
Nainan S, Kulkarni V (2020) Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN. Int J Speech Tech 24:809–822
Article Google Scholar
Nassif AB, Shahin I, Hamsa S, Nemmour N, Hirose K (2021) CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions. Appl Soft Comput 103:107141
Article Google Scholar
Nicolini C, Forcellini G, Minati L, Bifone A (2020) Scale-resolved analysis of brain functional connectivity networks with spectral entropy. Neuroimage 211:116603
Article Google Scholar
Rajesh P, Shajin F (2020) A multi-objective hybrid algorithm for planning electrical distribution system. Eur J Electr Eng 22(4–5):224–509
Article Google Scholar
Ravanelli M, Bengio Y (2018) Speaker recognition from raw waveform with sincnet. In: 2018 IEEE Spoken Language Technology Workshop (SLT) IEEE, pp 1021–1028
Reddy V, Prakash G (2019) Enhanced key establishment technique for secure data access in cloud. In: 2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT) 1:1–4
Richard G, Virtanen T, Bello JP, Ono N, Glotin H (2017) Introduction to the special section on sound scene and event analysis. IEEE/ACM Trans Audio Speech Lang Process 25(6):1169–1171
Article Google Scholar
Sangeetha J, Jayasankar T (2018) A novel whispered speaker identification system based on extreme learning machine. Int J Speech Technol 21(1):157–165
Article Google Scholar
Shahin I, Nassif AB, Hamsa S (2020) Novel cascaded Gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments. Neural Comput Appl 32(7):2575–2587
Article Google Scholar
Shajin FH, Rajesh P (2020) Trusted secure geographic routing protocol: outsider attack detection in mobile ad hoc networks by adopting trusted secure geographic routing protocol. Int J Pervasive Comput Commun. https://doi.org/10.1108/IJPCC-09-2020-0136
Article Google Scholar
Shon S, Tang H, Glass J (2018) Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model. In: 2018 ieee spoken language technology workshop (slt). IEEE, pp 1007–1013
Sun L, Gu T, Xie K, Chen J (2019) Text-independent speaker identification based on deep Gaussian correlation supervector. Int J Speech Technol 22(2):449–457
Article Google Scholar
Therese SS, Lingam C (2017) A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system. J Ambient Intell Humaniz Comput, pp 1–4
Thota MK, Shajin FH, Rajesh P (2020) Survey on software defect prediction techniques. Int J Appl Sci Eng 17(4):331–344
Google Scholar
Venkatesan R, Ganesh AB (2017) Unsupervised auditory saliency enabled binaural scene analyzer for speaker localization and recognition. In: International symposium on signal processing and intelligent recognition systems. Springer, Cham, pp 337–350
Villalba J, Chen N, Snyder D, Garcia-Romero D, McCree A, Sell G, Borgstrom J, García-Perera LP, Richardson F, Dehak R, Torres-Carrasquillo PA (2020) State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and speakers in the wild evaluations. Comput Speech Lang 60:101026
Article Google Scholar
Xu B, Wang W, Falzon G, Kwan P, Guo L, Sun Z, Li C (2020) Livestock classification and counting in quadcopter aerial images using Mask R-CNN. Int J Remote Sens 41(21):8121–8142
Article Google Scholar
Zagagy B, Herman M, Levi O (2021) ACKEM: automatic classification, using KNN based ensemble modeling. In: Future of information and communication conference. Springer, Cham, pp 536–557

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Electrical and Instrumentation Engineering, Thapar Institute of Engineering and Technology, Patiala, India
Gaurav, Saurabh Bhardwaj & Ravinder Agarwal

Authors

Gaurav
View author publications
You can also search for this author in PubMed Google Scholar
Saurabh Bhardwaj
View author publications
You can also search for this author in PubMed Google Scholar
Ravinder Agarwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gaurav.

Ethics declarations

Conflict of interest

Not applicable.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gaurav, Bhardwaj, S. & Agarwal, R. An efficient speaker identification framework based on Mask R-CNN classifier parameter optimized using hosted cuckoo optimization (HCO). J Ambient Intell Human Comput 14, 13613–13625 (2023). https://doi.org/10.1007/s12652-022-03828-7

Download citation

Received: 23 September 2021
Accepted: 10 March 2022
Published: 05 April 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s12652-022-03828-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient speaker identification framework based on Mask R-CNN classifier parameter optimized using hosted cuckoo optimization (HCO)

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Biometrics recognition using deep learning: a survey

Data availability

Code accessibility

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient speaker identification framework based on Mask R-CNN classifier parameter optimized using hosted cuckoo optimization (HCO)

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Biometrics recognition using deep learning: a survey

Data availability

Code accessibility

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation