Speaker Recognition Using Noise Robust Features and LSTM-RNN

Dua, Mohit; Sethi, Pawandeep Singh; Agrawal, Vinam; Chawla, Raghav

doi:10.1007/978-981-33-4299-6_2

Mohit Dua¹⁹,
Pawandeep Singh Sethi¹⁹,
Vinam Agrawal¹⁹ &
…
Raghav Chawla¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1299))

785 Accesses
2 Citations

Abstract

A tremendous growth has been observed in terms of active research in the field of speaker recognition. This has been mainly due to the increasing need of zero-touch interfaces in devices and mobile biometric authentication systems. This paper discusses implementation of text-independent speaker verification system using long short-term memory (LSTM)-based neural network for speaker modeling by using various approaches for the front-end feature extraction including Mel Frequency Spectral Coefficients (MFSC), Mel Frequency Cepstral Coefficients (MFCC), Gammatone Filter Spectra (GTF), and Gammatone Filter Cepstral Coefficients (GFCC). Additionally, to determine the best-suited speaker verification system for given noisy conditions of environment, all the combinational systems are tested under induced noisy conditions with white noise at −20 and −40 dB, as well as under clean environmental condition. The results show that the MFSC-based LSTM-RNN combination tends to perform better than all the other combinations regardless of the noise added in the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Graven, S.N., Browne, J.V.: Auditory development in the fetus and infant. Newborn Infant Nurs. Rev. 8(4), 187–193 (2008)
Article Google Scholar
Kisilevsky, B.S., Hains, S.M., Lee, K., Xie, X., Huang, H., Ye, H.H., Wang, Z.: Effects of experience on fetal voice recognition. Psychol. Sci. 14(3), 220–224 (2003)
Article Google Scholar
Wayman, J.L., Jain, A.K., Maltoni, D., Maio, D. (eds.): Biometric systems: technology, design and performance evaluation. In: Springer Science & Business Media (2005)
Google Scholar
Pruzansky, S.: Pattern-matching procedure for automatic talker recognition. J. Acoust. Soc. Am. 35(3), 354–358 (1963)
Article Google Scholar
Li, K.P., Dammann, J.E., Chapman, W.D.: Experimental studies in speaker verification, using an adaptive system. J. Acoust. Soc. Am. 40(5), 966–978 (1966)
Article Google Scholar
Haberman, W., Fejfar, A.: Automatic identification of personnel through speaker and signature verification—system description and testing. In: Proceedings of Carnahan Conference on Crime Countermeasures, pp. 23–30 (1976)
Google Scholar
NSTC Biometrics: “Speaker Recognition,” 7 August 2006. https://www.biometrics.gov/Documents/speakerrec.pdf. Accessed on March 2014
De La Torre, A., Segura, J. C., Benitez, C., Ramirez, J., Garcia, L., Rubio, A.J.: Speech recognition under noise conditions: compensation methods. In: Robust Speech Recognition and Understanding, 439 (2007)
Google Scholar
Speaker Recognition Evaluation,5 March 2012. Available https://www.nist.gov/itl/iad/mig/sre.cfm
McLaren, M., Vogt, R., Baker, B., Sridharan, S.: A comparison of session variability compensation techniques for SVM-based speaker recognition. In: Eighth Annual Conference of the International Speech Communication Association (2007)
Google Scholar
Reynolds, D.A.: An overview of automatic speaker recognition technology. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. IV-4072. IEEE (2002)
Google Scholar
Krishnamoorthy, P., Jayanna, H.S., Prasanna, S.M.: Speaker recognition under limited data condition by noise addition. Expert Syst. Appl. 38(10), 13487–13490 (2011)
Article Google Scholar
Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)
Article Google Scholar
Weninger, F., Erdogan, H., Watanabe, S., Vincent, E., Le Roux, J., Hershey, J.R., Schuller, B.: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 91–99. Springer, Cham (2015)
Google Scholar
Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S.: Deep neural network embeddings for text-independent speaker verification. In: Interspeech, pp. 999–1003 (2017)
Google Scholar
Guan, Z., Ashby, C.S., Moulinier, I.A.Y., Dickison, M.E.: U.S. Patent No. 10,659,588. U.S. Patent and Trademark Office, Washington, DC (2020)
Google Scholar
Wanli, Z., Guoxin, L.: The research of feature extraction based on MFCC for speaker recognition. In: Proceedings of 2013 3rd International Conference on Computer Science and Network Technology, pp. 1074–1077 (2013)
Google Scholar
Shi, X., Yang, H., Zhou, P.: Robust speaker recognition based on improved GFCC. In: 2016 2nd IEEE International Conference on Computer and Communications (ICCC), pp. 1927–1931 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, National Institute of Technology, Kurukshetra, Kurukshetra, India
Mohit Dua, Pawandeep Singh Sethi, Vinam Agrawal & Raghav Chawla

Authors

Mohit Dua
View author publications
You can also search for this author in PubMed Google Scholar
Pawandeep Singh Sethi
View author publications
You can also search for this author in PubMed Google Scholar
Vinam Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Raghav Chawla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohit Dua .

Editor information

Editors and Affiliations

Department of Computer Science, Rama Devi Women’s University, Bhubaneswar, Odisha, India
Chhabi Rani Panigrahi
Department of Computer Science, Rama Devi Women’s University, Bhubaneswar, Odisha, India
Bibudhendu Pati
Department of Computer Science and Engineering, S ‘O’ A Deemed to be University, Bhubaneswar, Odisha, India
Binod Kumar Pattanayak
Faculty of Information and Communication Technology, Université des Mascareignes, Pamplemousses, Mauritius
Seeven Amic
Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan
Kuan-Ching Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dua, M., Sethi, P.S., Agrawal, V., Chawla, R. (2021). Speaker Recognition Using Noise Robust Features and LSTM-RNN. In: Panigrahi, C.R., Pati, B., Pattanayak, B.K., Amic, S., Li, KC. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 1299. Springer, Singapore. https://doi.org/10.1007/978-981-33-4299-6_2

Download citation

DOI: https://doi.org/10.1007/978-981-33-4299-6_2
Published: 16 April 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4298-9
Online ISBN: 978-981-33-4299-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics