Abstract
A tremendous growth has been observed in terms of active research in the field of speaker recognition. This has been mainly due to the increasing need of zero-touch interfaces in devices and mobile biometric authentication systems. This paper discusses implementation of text-independent speaker verification system using long short-term memory (LSTM)-based neural network for speaker modeling by using various approaches for the front-end feature extraction including Mel Frequency Spectral Coefficients (MFSC), Mel Frequency Cepstral Coefficients (MFCC), Gammatone Filter Spectra (GTF), and Gammatone Filter Cepstral Coefficients (GFCC). Additionally, to determine the best-suited speaker verification system for given noisy conditions of environment, all the combinational systems are tested under induced noisy conditions with white noise at −20 and −40 dB, as well as under clean environmental condition. The results show that the MFSC-based LSTM-RNN combination tends to perform better than all the other combinations regardless of the noise added in the dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Graven, S.N., Browne, J.V.: Auditory development in the fetus and infant. Newborn Infant Nurs. Rev. 8(4), 187–193 (2008)
Kisilevsky, B.S., Hains, S.M., Lee, K., Xie, X., Huang, H., Ye, H.H., Wang, Z.: Effects of experience on fetal voice recognition. Psychol. Sci. 14(3), 220–224 (2003)
Wayman, J.L., Jain, A.K., Maltoni, D., Maio, D. (eds.): Biometric systems: technology, design and performance evaluation. In: Springer Science & Business Media (2005)
Pruzansky, S.: Pattern-matching procedure for automatic talker recognition. J. Acoust. Soc. Am. 35(3), 354–358 (1963)
Li, K.P., Dammann, J.E., Chapman, W.D.: Experimental studies in speaker verification, using an adaptive system. J. Acoust. Soc. Am. 40(5), 966–978 (1966)
Haberman, W., Fejfar, A.: Automatic identification of personnel through speaker and signature verification—system description and testing. In: Proceedings of Carnahan Conference on Crime Countermeasures, pp. 23–30 (1976)
NSTC Biometrics: “Speaker Recognition,” 7 August 2006. https://www.biometrics.gov/Documents/speakerrec.pdf. Accessed on March 2014
De La Torre, A., Segura, J. C., Benitez, C., Ramirez, J., Garcia, L., Rubio, A.J.: Speech recognition under noise conditions: compensation methods. In: Robust Speech Recognition and Understanding, 439 (2007)
Speaker Recognition Evaluation,5 March 2012. Available https://www.nist.gov/itl/iad/mig/sre.cfm
McLaren, M., Vogt, R., Baker, B., Sridharan, S.: A comparison of session variability compensation techniques for SVM-based speaker recognition. In: Eighth Annual Conference of the International Speech Communication Association (2007)
Reynolds, D.A.: An overview of automatic speaker recognition technology. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. IV-4072. IEEE (2002)
Krishnamoorthy, P., Jayanna, H.S., Prasanna, S.M.: Speaker recognition under limited data condition by noise addition. Expert Syst. Appl. 38(10), 13487–13490 (2011)
Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)
Weninger, F., Erdogan, H., Watanabe, S., Vincent, E., Le Roux, J., Hershey, J.R., Schuller, B.: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 91–99. Springer, Cham (2015)
Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S.: Deep neural network embeddings for text-independent speaker verification. In: Interspeech, pp. 999–1003 (2017)
Guan, Z., Ashby, C.S., Moulinier, I.A.Y., Dickison, M.E.: U.S. Patent No. 10,659,588. U.S. Patent and Trademark Office, Washington, DC (2020)
Wanli, Z., Guoxin, L.: The research of feature extraction based on MFCC for speaker recognition. In: Proceedings of 2013 3rd International Conference on Computer Science and Network Technology, pp. 1074–1077 (2013)
Shi, X., Yang, H., Zhou, P.: Robust speaker recognition based on improved GFCC. In: 2016 2nd IEEE International Conference on Computer and Communications (ICCC), pp. 1927–1931 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dua, M., Sethi, P.S., Agrawal, V., Chawla, R. (2021). Speaker Recognition Using Noise Robust Features and LSTM-RNN. In: Panigrahi, C.R., Pati, B., Pattanayak, B.K., Amic, S., Li, KC. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 1299. Springer, Singapore. https://doi.org/10.1007/978-981-33-4299-6_2
Download citation
DOI: https://doi.org/10.1007/978-981-33-4299-6_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4298-9
Online ISBN: 978-981-33-4299-6
eBook Packages: EngineeringEngineering (R0)