Abstract
Speaker recognition is the process of recognizing the speaker by using speaker-specific information. A speaker recognition system can be classified into text-dependent speaker recognition and text-independent speaker recognition systems. In a text-dependent system, the recognition phrases are fixed (known beforehand). The user can be prompted to read a randomly selected sequence of numbers. However, in a text-independent speaker recognition system, there are no constraints on the words which the speakers are allowed to use. What is spoken in training and what is uttered in actual use may have completely different content. The entire domain of speaker recognition can be further categorized into speaker identification and speaker verification. Speaker verification evaluates whether the voice belongs to some person, while speaker identification tries to find out the person it belongs to. In this paper, Mel-frequency cepstral coefficients (MFCC) were extracted from the audio files. These features were then fed a convolutional neural network (CNN). This CNN was then optimized in order to increase model accuracy. Over the span of six runs of varying parameters, a maximum accuracy of approx. 97% was achieved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Furui, Sadaoki. (1996). An Overview of Speaker Recognition Technology. https://doi.org/10.1007/978-1-4613-1367-0_2
M S, Sinith & Salim, Anoop & Sankar, K. & Narayanan, K. & Soman, Vishnu. (2010). A novel method for Text Independent speaker identification using MFCC and GMM. 292–296. https://doi.org/10.1109/ICALIP.2010.5684389
Mahboob, Tahira & Khanam, Memoona & Khiyal, Malik & Bibi, Ruqia. (2015). Speaker Identification Using GMM with MFCC. International Journal of Computer Science Issues. 12. 126-135.
Santosh, K. Gaikwad & Bharti, W. Gawali & Yannawar, Pravin. (2010). A Review on Speech Recognition Technique. International Journal of Computer Applications. 10. https://doi.org/10.5120/1462-1976.
Hasan, Md & Jamil, Mustafa & Rabbani, Golam & Rahman, Md. Saifur. (2004). Speaker Identification Using Mel Frequency Cepstral Coefficients. Proceedings of the 3rd International Conference on Electrical and Computer Engineering (ICECE 2004).
S. Bunrit, T. Inkian, N. Kerdprasop & K. Kerdprasop (2019). Text-Independent Speaker Identification Using Deep Learning Model of Convolution Neural Network. International Journal of Machine Learning and Computing, Vol. 9, No. 2, April 2019. https://doi.org/10.18178/ijmlc.2019.9.2.778
Reynolds, D.A. & Rose, Richard. (1995). Robust text-independent speaker identification using Gaussian Mixture speaker models. Speech and Audio Processing, IEEE Transactions on. 3. 72–83. https://doi.org/10.1109/89.365379.
Reynolds, Douglas. (1995). Speaker identification and verification using Gaussian Mixture Speaker Models. Speech Communication. 17. 91-108. https://doi.org/10.1016/0167-6393(95)00009-D.
Ioffe, Sergey & Szegedy, Christian. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://arxiv.org/abs/1502.03167
Springenberg, Jost & Dosovitskiy, Alexey & Brox, Thomas & Riedmiller, Martin. (2014). Striving for Simplicity: The All Convolutional Net. https://arxiv.org/abs/1412.6806
A. Banerjee, A. Dubey, A. Menon, S. Nanda & G.C. Nandi.Speaker Recognition Using Deep Belief Networks to CCIS proceedings. https://arxiv.org/abs/1805.08865
S. Bhardwaj, S. Srivastava, M.Hanmandlu, J.R.P.Gupta. GFM Based Methods for Text Independent Speaker Identification. IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 43, no.3, pp. 1047–1058, 2013. https://doi.org/10.1109/TSMCB.2012.2223461.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Srivastava, S., Chaudhary, G., Shukla, C. (2021). Text-Independent Speaker Recognition Using Deep Learning. In: Srivastava, S., Khari, M., Gonzalez Crespo, R., Chaudhary, G., Arora, P. (eds) Concepts and Real-Time Applications of Deep Learning. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-76167-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-76167-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76166-0
Online ISBN: 978-3-030-76167-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)