Text-Independent Speaker Recognition Using Deep Learning

Srivastava, Smriti; Chaudhary, Gopal; Shukla, Chandrakesh

doi:10.1007/978-3-030-76167-7_2

Smriti Srivastava⁷,
Gopal Chaudhary⁸ &
Chandrakesh Shukla⁹

Part of the book series: EAI/Springer Innovations in Communication and Computing ((EAISICC))

414 Accesses

Abstract

Speaker recognition is the process of recognizing the speaker by using speaker-specific information. A speaker recognition system can be classified into text-dependent speaker recognition and text-independent speaker recognition systems. In a text-dependent system, the recognition phrases are fixed (known beforehand). The user can be prompted to read a randomly selected sequence of numbers. However, in a text-independent speaker recognition system, there are no constraints on the words which the speakers are allowed to use. What is spoken in training and what is uttered in actual use may have completely different content. The entire domain of speaker recognition can be further categorized into speaker identification and speaker verification. Speaker verification evaluates whether the voice belongs to some person, while speaker identification tries to find out the person it belongs to. In this paper, Mel-frequency cepstral coefficients (MFCC) were extracted from the audio files. These features were then fed a convolutional neural network (CNN). This CNN was then optimized in order to increase model accuracy. Over the span of six runs of varying parameters, a maximum accuracy of approx. 97% was achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Furui, Sadaoki. (1996). An Overview of Speaker Recognition Technology. https://doi.org/10.1007/978-1-4613-1367-0_2
Book Google Scholar
M S, Sinith & Salim, Anoop & Sankar, K. & Narayanan, K. & Soman, Vishnu. (2010). A novel method for Text Independent speaker identification using MFCC and GMM. 292–296. https://doi.org/10.1109/ICALIP.2010.5684389
Mahboob, Tahira & Khanam, Memoona & Khiyal, Malik & Bibi, Ruqia. (2015). Speaker Identification Using GMM with MFCC. International Journal of Computer Science Issues. 12. 126-135.
Google Scholar
Santosh, K. Gaikwad & Bharti, W. Gawali & Yannawar, Pravin. (2010). A Review on Speech Recognition Technique. International Journal of Computer Applications. 10. https://doi.org/10.5120/1462-1976.
Hasan, Md & Jamil, Mustafa & Rabbani, Golam & Rahman, Md. Saifur. (2004). Speaker Identification Using Mel Frequency Cepstral Coefficients. Proceedings of the 3rd International Conference on Electrical and Computer Engineering (ICECE 2004).
Google Scholar
S. Bunrit, T. Inkian, N. Kerdprasop & K. Kerdprasop (2019). Text-Independent Speaker Identification Using Deep Learning Model of Convolution Neural Network. International Journal of Machine Learning and Computing, Vol. 9, No. 2, April 2019. https://doi.org/10.18178/ijmlc.2019.9.2.778
Reynolds, D.A. & Rose, Richard. (1995). Robust text-independent speaker identification using Gaussian Mixture speaker models. Speech and Audio Processing, IEEE Transactions on. 3. 72–83. https://doi.org/10.1109/89.365379.
Article Google Scholar
Reynolds, Douglas. (1995). Speaker identification and verification using Gaussian Mixture Speaker Models. Speech Communication. 17. 91-108. https://doi.org/10.1016/0167-6393(95)00009-D.
Article Google Scholar
Ioffe, Sergey & Szegedy, Christian. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://arxiv.org/abs/1502.03167
Springenberg, Jost & Dosovitskiy, Alexey & Brox, Thomas & Riedmiller, Martin. (2014). Striving for Simplicity: The All Convolutional Net. https://arxiv.org/abs/1412.6806
A. Banerjee, A. Dubey, A. Menon, S. Nanda & G.C. Nandi.Speaker Recognition Using Deep Belief Networks to CCIS proceedings. https://arxiv.org/abs/1805.08865
S. Bhardwaj, S. Srivastava, M.Hanmandlu, J.R.P.Gupta. GFM Based Methods for Text Independent Speaker Identification. IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 43, no.3, pp. 1047–1058, 2013. https://doi.org/10.1109/TSMCB.2012.2223461.

Download references

Author information

Authors and Affiliations

Department of Instrumentation and Control Engineering, Netaji Subhas University of Technology, Dwarka, New Delhi, India
Smriti Srivastava
Bharati Vidyapeeth’s College of Engineering, New Delhi, India
Gopal Chaudhary
Department of Electrical Engineering, Indian Institute of Technology (Indian School of Mines) Dhanbad, Dhanbad, Jharkhand, India
Chandrakesh Shukla

Authors

Smriti Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Gopal Chaudhary
View author publications
You can also search for this author in PubMed Google Scholar
Chandrakesh Shukla
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instrumentation and Control Engineering, Netaji Subhas University of Technology, New Delhi, Delhi, India
Smriti Srivastava
School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, Delhi, India
Manju Khari
Department of Computer Science and Technology, Universidad Internacional de La Rioja, Logroño, La Rioja, Spain
Ruben Gonzalez Crespo
Department of Electronics and Communication Engineering, Bharati Vidyapeeth’s College of Engineering, New Delhi, Delhi, India
Gopal Chaudhary
Department of Electronics and Communication Engineering, Jaypee Institute of Information Technology, Noida, India
Parul Arora

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Srivastava, S., Chaudhary, G., Shukla, C. (2021). Text-Independent Speaker Recognition Using Deep Learning. In: Srivastava, S., Khari, M., Gonzalez Crespo, R., Chaudhary, G., Arora, P. (eds) Concepts and Real-Time Applications of Deep Learning. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-76167-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-76167-7_2
Published: 24 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76166-0
Online ISBN: 978-3-030-76167-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics