Natural Language Processing: Speaker, Language, and Gender Identification with LSTM

Nammous, Mohammad K.; Saeed, Khalid

doi:10.1007/978-981-13-3702-4_9

Mohammad K. Nammous^18,19 &
Khalid Saeed²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 883))

477 Accesses
13 Citations

Abstract

Long short-term memory (LSTM) is a state-of-the-art network used for different tasks related to natural language processing (NLP), pattern recognition, and classification. It has been successfully used for speech recognition and speaker identification as well. The amount of training data and the ratio of training to test data are still the key factors for achieving good results, but have their implications on the real usage. The main contribution of this paper is to achieve a high rate of speaker recognition for text-independent continuous speech using small ratio of training to test data, by applying long short-term memory recursive neural network. A comparison with the probabilistic feed-forward neural network has been made for speaker recognition as well as gender and language identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Pearson Prentice Hall (2008)
Google Scholar
Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57(2016), 345–420 (2016)
Article MathSciNet Google Scholar
Saeed, K., Nammous, M.K.: A speech-and-speaker identification system: feature extraction, description, and classification of speech-signal image. IEEE Trans. Ind. Electron. 54(2), 887–897 (2007)
Article Google Scholar
Nammous, M.K., Szczepanski, A., Saeed, K.: An exploratory research on text-independent speaker recognition. In: HAIS, Part 1, pp. 412–419 (2011)
Google Scholar
Ahmed, H., Elaraby, M.S., Moussa, A.M., Abdallah, M., Abdou, S.M., Rashwan, M.: An unsupervised speaker clustering technique based on SOM and I-vectors for speech recognition systems. In: The Third Arabic Natural Language Processing Workshop, EACL, Valencia, Spain (2017)
Google Scholar
Sarria-Paja, M., Falk, T.H.: Variants of mel-frequency cepstral coefficients for improved whispered speech speaker verification in mismatched conditions. In: 25th European Signal Processing Conference (EUSIPCO) (2017)
Google Scholar
Lopez-Otero, P., Docio-Fernandez, L., Garcia-Mateo, C.: I-vectors for continuous emotion recognition. Training 45, 50 (2014)
Google Scholar
Bahari, M.H., Mclaren, M., Van Hamme, H., Van Leeuwen, D.A.: Speaker age estimation using I-vectors. Eng. Appl. Artif. Intell. 34, 99–108 (2014)
Article Google Scholar
Motlicek, P., Dey, S., Madikeri, S., Burget, L.: Employment of subspace gaussian mixture models in speaker recognition. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, pp. 4445–4449 (2015)
Google Scholar
Saeed, K.: Carathéodory–Toeplitz based mathematical methods and their algorithmic applications in biometric image processing. Appl. Numer. Math. 75, 2–21 (2014)
Article Google Scholar
Specht, D.F.: Probabilistic neural networks and the polynomial adaline as complementary techniques for classification. IEEE Trans. Neural Netw. 1, 11–121 (1990)
Article Google Scholar
Low, R., Togneri, R.: Speech recognition using the probabilistic neural network. In: Proceedings of ICSLP98 (1998)
Google Scholar
Phan, H., Koch, P., Katzberg, F., Maass, M., Mazur, R., Mertins, A.: Audio scene classification with deep recurrent neural networks (2017). arXiv:1703.04770
Qawaqneh, Z., Mallouh, A.A., Barkana, B.D.: Deep neural network framework and transformed MFCCs for speaker’s age and gender classification. Knowl. Based Syst. 115, 5–14 (2017)
Article Google Scholar
Becerra, A., de la Rosa, J.I., González, E.: Speech recognition in a dialog system: from conventional to deep processing. In: Multimedia Tools and Applications, pp. 1–37. Springer (2017)
Google Scholar
López Moreno, I.: Deep neural network architectures for large-scale, robust and small-footprint speaker and language recognition. Ph.D. thesis. Universidad Politécnica de Madrid (2017)
Google Scholar
Bell, P., Gales, M., Hain, T., Kilgour, J., Lanchantin, P., Liu, X., McParland, A., Renals, S., Saz, O., Wester, M., Woodland, P.: The MGB challenge: evaluating multi-genre broadcast media recognition. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 687–693. IEEE (2015)
Google Scholar
Feng, L., Hansen, L.K.: A new database for speaker recognition. Technical report (2005)
Google Scholar
McLaren, M., Ferrer, L., Castán, D., Lawson, A.: The speakers in the wild (SITW) speaker recognition database. In: INTERSPEECH, vol. 2016, pp. 818–822 (2016)
Google Scholar
Woo, R.H., Park, A., Hazen, T.J.: The MIT mobile device speaker verification corpus: data collection and preliminary experiments. In: The Speaker and Language Recognition Workshop (2006)
Google Scholar
Godfrey, J.J., Holliman, E.C., McDaniel, J.: Switchboard: telephone speech corpus for research and development. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 517–520. IEEE (1992)
Google Scholar
Greenberg, C.S.: The NIST year 2012 speaker recognition evaluation plan. NIST, Technical report (2012)
Google Scholar
Poignant, J., Besacier, L., Quénot, G.: Unsupervised speaker identification in TV broadcast based on written names. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1) (2015)
Google Scholar
Nagraniy, A., Chungy, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: INTERSPEECH (2017)
Google Scholar
Nammous M., Saeed K.: Voice-print and text-independent speaker identification. In: International Conference on Electrical Engineering Design and Technologies—ICEEDT’07, 1 Jan 2007. International Conference on Electrical Engineering Design and Technologies—ICEEDT’08, Tunisia (2007)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York, NY (2006)
MATH Google Scholar
Kusy, M., Zajdel, R.: Probabilistic neural network training procedure based on Q(0)-learning algorithm in medical data classification. Appl. Intell. 41, 837–854 (2014)
Article Google Scholar
Specht, D.F.: Probabilistic neural networks. Neural Netw. 3(1), 109–118 (1990)
Article Google Scholar
Lewicki, P., Hill, T.: Statistics: Methods and Applications: a Comprehensive Reference for Science, Industry, and Data Mining. StatSoft Inc, Tulsa, OK (2006)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: COLING 2016, pp. 3485–3495 (2016)
Google Scholar
Lu, Y., Lu, C., Tang, C.-K.: Online video object detection using association LSTM. In: The IEEE International Conference on Computer Vision (ICCV), pp. 2344–2352 (2017)
Google Scholar
Akopyan, M., Khashba, E.: Large-scale YouTube-8M video understanding with deep neural networks (2017). arXiv:1706.04488
Xu, J., Chen, D., Qiu, X., Huang, X.: Cached long short-term memory neural networks for document-level sentiment classification. In: EMNLP 2016, pp. 1660–1669 (2016)
Google Scholar
Lu, L., Renals, S.: Small-footprint highway deep neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 25(7), 1502–1511 (2017)
Article Google Scholar
Chen, J., Wang, D.L.: Long short-term memory for speaker generalization in supervised speech separation. In: INTERSPEECH, pp. 3314–3318 (2016)
Google Scholar
Saeed, K., Adamski, M., Bhattasali, T., Nammous, M.K., Panasiuk, P., Rybnik, M., Shaikh, S.H.: New Directions in Behavioral Biometrics. CRC Press (2016)
Google Scholar

Download references

Acknowledgements

This work was supported by grant S/WI/3/2018 from Bialystok University of Technology and funded with resources for research by the Ministry of Science and Higher Education in Poland.

Author information

Authors and Affiliations

Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, Sankt Augustin, Germany
Mohammad K. Nammous
Faculty of Mathematics and Information Sciences, Warsaw University of Technology, Warsaw, Poland
Mohammad K. Nammous
Faculty of Computer Science, Bialystok University of Technology, Bialystok, Poland
Khalid Saeed

Authors

Mohammad K. Nammous
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Saeed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad K. Nammous .

Editor information

Editors and Affiliations

A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Rituparna Chaki
Dipartimento di Scienze Ambientali, Informatica e Statistica, Università Ca’ Foscari, Mestre, Venice, Venezia, Italy
Agostino Cortesi
Faculty of Computer Science, Bialystok University of Technology, Bialystok, Poland
Khalid Saeed
Department of Computer Science and Engineering, University of Calcutta, Kolkata, West Bengal, India
Nabendu Chaki

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nammous, M.K., Saeed, K. (2019). Natural Language Processing: Speaker, Language, and Gender Identification with LSTM. In: Chaki, R., Cortesi, A., Saeed, K., Chaki, N. (eds) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing, vol 883. Springer, Singapore. https://doi.org/10.1007/978-981-13-3702-4_9

Download citation

DOI: https://doi.org/10.1007/978-981-13-3702-4_9
Published: 17 January 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3701-7
Online ISBN: 978-981-13-3702-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics