A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

Velayuthapandian, Karthikeyan; Subramoniam, Suja Priyadharsini

doi:10.1007/s11760-023-02500-7

A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

Original Paper
Published: 06 March 2023

Volume 17, pages 2817–2825, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Karthikeyan Velayuthapandian¹ &
Suja Priyadharsini Subramoniam²

3 Citations
1 Altmetric
Explore all metrics

Abstract

The process of identifying a spokesperson from a collection of subsequent time series data is referred to as speaker identification. Convolutional neural networks (CNNs) and deep neural networks are the two types of neural networks that are used in the majority of modern experimental approaches. This work presents a CNN model for speaker identification using a jump-connected one-dimensional convolutional neural network (1-D CNN) with a focus module (FM). The 1-D convolutional layer integrated with FM is employed in the presented model for speaker characteristic extraction and lessens heterogeneity in the temporal and spatial domains, allowing for quicker layer processing. Furthermore, the layered CNN hopping interconnection is employed to overcome the connectivity glitches, and a solution based on softmax loss and smooth L1-norm combined regulation is presented to increase efficiency. The recommended network model was evaluated using the ELSDSR, TIMIT, NIST, 16,000 PCM, and experimental audio datasets. According to experimental data, the equal error rate (EER) of end-to-end CNN for voiceprint identification is 9.02% higher than baseline approaches. In experiments, our proposed speaker recognition (SR) model, which we refer to as the deep FM-1D CNN, had a high recognition accuracy of 99.21%. Moreover, the observations demonstrate that the proposed network model is more robust than other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A stacked convolutional neural network framework with multi-scale attention mechanism for text-independent voiceprint recognition

Article 27 April 2024

An optimized enhanced-multi learner approach towards speaker identification based on single-sound segments

Article Open access 17 August 2023

Efficient-net Speaker Recognition Master—The New Speaker Recognition System Built Base on Efficient-net

Availability of data and materials

The authors do not have permission to share data.

References

Beigi, H.: Speaker recognition: advancements and challenges. New Trends Dev. Biometr. 3–29 (2012)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Article Google Scholar
Togneri, R., Pullella, D.: An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst. Mag. 11(2), 23–61 (2011)
Article Google Scholar
Li, W.: Speaker Identification from Raw Waveform with LineNet. arXiv preprint arXiv:2105.14826 (2021)
Abdalmalak, K.A., Gallardo-Antolín, A.: Enhancement of a text-independent speaker verification system by using feature combination and parallel structure classifiers. Neural Comput. Appl. 29(3), 637–651 (2018)
Article Google Scholar
Karthikeyan, V., Suja Priyadharsini, S.: Hybrid machine learning classification scheme for speaker identification. J. Forens. Sci. 46(3), 1033–1048 (2022). https://doi.org/10.1111/1556-4029.15006
Article Google Scholar
Wan, L., Wang, Q., Papir, A., & Moreno, I.L.: Generalized end-to-end loss for speaker verification. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4879–4883. IEEE (2018)
Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres-Carrasquillo, P.A.: Support vector machines for speaker and language recognition. Comput. Speech Lang. 20(2–3), 210–229 (2006)
Article Google Scholar
Shi, W., Shuang, F.: End-to-end convolutional neural network for speaker recognition based on joint supervision. In: 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), pp 385–389. IEEE (2019)
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Article MathSciNet MATH Google Scholar
Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker identification by combining MFCC and phase information in noisy environments. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4502–4505. IEEE (2010)
Gudnason, J., Brookes, M.: Voice source cepstrum coefficients for speaker identification. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4821–4824. IEEE (2008)
Lawson, A., Vabishchevich, P., Huggins, M., Ardis, P., Battles, B., Stauffer, A.: Survey and evaluation of acoustic features for speaker recognition. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5444–5447. IEEE (2011)
Kenny, P., Stafylakis, T., Ouellet, P., Alam, M.J.: JFA-based front ends for speaker recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1705–1709. IEEE (2014)
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)
Article Google Scholar
Kanagasundaram, A., Dean, D., Sridharan, S., Gonzalez-Dominguez, J., Gonzalez-Rodriguez, J., Ramos, D.: Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques. Speech Commun. 59, 69–82 (2014)
Article Google Scholar
Sell, G., Garcia-Romero, D.: Speaker diarization with PLDA i-vector scoring and unsupervised calibration. In: 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 413–417. IEEE (2014)
Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. In: IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788–798 (2011). https://doi.org/10.1109/TASL.2010.2064307
Variani, E., Lei, X., McDermott, E., Moreno, I.L., Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4052–4056. IEEE (2014)
Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S.: Deep neural network embeddings for text-independent speaker verification. In: Interspeech, pp. 999–1003 (2017)
Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)
Article Google Scholar
Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:1701.02720 (2017)
Ramoji, S., Krishnan, P., Ganapathy, S.: NPLDA: A deep neural PLDA model for speaker verification. arXiv preprint arXiv:2002.03562 (2020)
Zhang, C., Koishida, K., Hansen, J.H.: Text-independent speaker verification based on triplet convolutional neural network embeddings. IEEE/ACM Trans. Audio, Speech, Lang. Process. 26(9), 1633–1644 (2018)
Article Google Scholar
Saeed, K., Nammous, M.K.: A speech-and-speaker identification system: feature extraction, description, and classification of speech-signal image. IEEE Trans. Ind. Electron. 54(2), 887–897 (2007)
Article Google Scholar
Xiao, M., Wu, Y., Zuo, G., Fan, S., Yu, H., Shaikh, Z.A., Wen, Z.: Addressing overfitting problem in deep learning-based solutions for next generation data-driven networks. Wirel. Commun. Mob. Comput. (2021)
Alzubaidi, L., Zhang, J., Humaidi, A.J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M.A., Al-Amidie, M., Farhan, L.: Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8(1), 1–74 (2021)
Article Google Scholar
Jain, D., Kumar, A., Garg, G.: Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN. Appl. Soft Comput. 91, 106198 (2020)
Article Google Scholar
Karthikeyan, V., Suja Priyadharsini, S.: Modified layer deep convolution neural network for text-independent speaker recognition. J. Exp. Theor. Artif. Intell. 1–13 (2022)
Brooks, C.: Introductory econometrics for finance, 2nd edn. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Karthikeyan, V., Suja Priyadharsini, S.: A strong hybrid AdaBoost classification algorithm for speaker recognition. Sādhanā 46(3), 1–19 (2021). https://doi.org/10.1007/s12046-021-01649-6
Article Google Scholar
Feng, L.: Speaker recognition. Master's Thesis, Technical University of Denmark, DTU, DK-2800 Kgs,yngby, Denmark (2004)
Garofolo, J.S.: Timit acoustic phonetic continuous speech corpus. Ling. Data Consort. 1993 (1993)
NIST Multimodal Information Group: 2008 NIST speaker recognition evaluation training set part 1 LDC2011S05. Linguistic Data Consortium, Philadelphia (2011)

Download references

Funding

Not Applicable.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamil Nadu, India
Karthikeyan Velayuthapandian
Department of Electronics and Communication Engineering, Anna University Regional Campus, Tirunelveli, Tamil Nadu, India
Suja Priyadharsini Subramoniam

Authors

Karthikeyan Velayuthapandian
View author publications
You can also search for this author in PubMed Google Scholar
Suja Priyadharsini Subramoniam
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Karthikeyan Velayuthapandian contributed to conceptualisation, methodology/study design, software, validation, formal analysis, investigation, resources, data curation, writing—original draft, writing—review and editing, and visualisation. Suja Priyadharsini Subramoniam contributed to conceptualisation, validation, formal analysis, investigation, resources, writing—review and editing, visualisation, and supervision.

Corresponding author

Correspondence to Karthikeyan Velayuthapandian.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Velayuthapandian, K., Subramoniam, S.P. A focus module-based lightweight end-to-end CNN framework for voiceprint recognition. SIViP 17, 2817–2825 (2023). https://doi.org/10.1007/s11760-023-02500-7

Download citation

Received: 27 July 2022
Revised: 06 January 2023
Accepted: 11 January 2023
Published: 06 March 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11760-023-02500-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

Abstract

Access this article

Similar content being viewed by others

A stacked convolutional neural network framework with multi-scale attention mechanism for text-independent voiceprint recognition

An optimized enhanced-multi learner approach towards speaker identification based on single-sound segments

Efficient-net Speaker Recognition Master—The New Speaker Recognition System Built Base on Efficient-net

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A focus module-based lightweight end-to-end CNN framework for voiceprint recognition

Abstract

Access this article

Similar content being viewed by others

A stacked convolutional neural network framework with multi-scale attention mechanism for text-independent voiceprint recognition

An optimized enhanced-multi learner approach towards speaker identification based on single-sound segments

Efficient-net Speaker Recognition Master—The New Speaker Recognition System Built Base on Efficient-net

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation