Skip to main content

Natural Language Processing: Speaker, Language, and Gender Identification with LSTM

  • Chapter
  • First Online:
Advanced Computing and Systems for Security

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 883))

Abstract

Long short-term memory (LSTM) is a state-of-the-art network used for different tasks related to natural language processing (NLP), pattern recognition, and classification. It has been successfully used for speech recognition and speaker identification as well. The amount of training data and the ratio of training to test data are still the key factors for achieving good results, but have their implications on the real usage. The main contribution of this paper is to achieve a high rate of speaker recognition for text-independent continuous speech using small ratio of training to test data, by applying long short-term memory recursive neural network. A comparison with the probabilistic feed-forward neural network has been made for speaker recognition as well as gender and language identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Pearson Prentice Hall (2008)

    Google Scholar 

  2. Goldberg, Y.: A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57(2016), 345–420 (2016)

    Article  MathSciNet  Google Scholar 

  3. Saeed, K., Nammous, M.K.: A speech-and-speaker identification system: feature extraction, description, and classification of speech-signal image. IEEE Trans. Ind. Electron. 54(2), 887–897 (2007)

    Article  Google Scholar 

  4. Nammous, M.K., Szczepanski, A., Saeed, K.: An exploratory research on text-independent speaker recognition. In: HAIS, Part 1, pp. 412–419 (2011)

    Google Scholar 

  5. Ahmed, H., Elaraby, M.S., Moussa, A.M., Abdallah, M., Abdou, S.M., Rashwan, M.: An unsupervised speaker clustering technique based on SOM and I-vectors for speech recognition systems. In: The Third Arabic Natural Language Processing Workshop, EACL, Valencia, Spain (2017)

    Google Scholar 

  6. Sarria-Paja, M., Falk, T.H.: Variants of mel-frequency cepstral coefficients for improved whispered speech speaker verification in mismatched conditions. In: 25th European Signal Processing Conference (EUSIPCO) (2017)

    Google Scholar 

  7. Lopez-Otero, P., Docio-Fernandez, L., Garcia-Mateo, C.: I-vectors for continuous emotion recognition. Training 45, 50 (2014)

    Google Scholar 

  8. Bahari, M.H., Mclaren, M., Van Hamme, H., Van Leeuwen, D.A.: Speaker age estimation using I-vectors. Eng. Appl. Artif. Intell. 34, 99–108 (2014)

    Article  Google Scholar 

  9. Motlicek, P., Dey, S., Madikeri, S., Burget, L.: Employment of subspace gaussian mixture models in speaker recognition. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, pp. 4445–4449 (2015)

    Google Scholar 

  10. Saeed, K.: Carathéodory–Toeplitz based mathematical methods and their algorithmic applications in biometric image processing. Appl. Numer. Math. 75, 2–21 (2014)

    Article  Google Scholar 

  11. Specht, D.F.: Probabilistic neural networks and the polynomial adaline as complementary techniques for classification. IEEE Trans. Neural Netw. 1, 11–121 (1990)

    Article  Google Scholar 

  12. Low, R., Togneri, R.: Speech recognition using the probabilistic neural network. In: Proceedings of ICSLP98 (1998)

    Google Scholar 

  13. Phan, H., Koch, P., Katzberg, F., Maass, M., Mazur, R., Mertins, A.: Audio scene classification with deep recurrent neural networks (2017). arXiv:1703.04770

  14. Qawaqneh, Z., Mallouh, A.A., Barkana, B.D.: Deep neural network framework and transformed MFCCs for speaker’s age and gender classification. Knowl. Based Syst. 115, 5–14 (2017)

    Article  Google Scholar 

  15. Becerra, A., de la Rosa, J.I., González, E.: Speech recognition in a dialog system: from conventional to deep processing. In: Multimedia Tools and Applications, pp. 1–37. Springer (2017)

    Google Scholar 

  16. López Moreno, I.: Deep neural network architectures for large-scale, robust and small-footprint speaker and language recognition. Ph.D. thesis. Universidad Politécnica de Madrid (2017)

    Google Scholar 

  17. Bell, P., Gales, M., Hain, T., Kilgour, J., Lanchantin, P., Liu, X., McParland, A., Renals, S., Saz, O., Wester, M., Woodland, P.: The MGB challenge: evaluating multi-genre broadcast media recognition. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 687–693. IEEE (2015)

    Google Scholar 

  18. Feng, L., Hansen, L.K.: A new database for speaker recognition. Technical report (2005)

    Google Scholar 

  19. McLaren, M., Ferrer, L., Castán, D., Lawson, A.: The speakers in the wild (SITW) speaker recognition database. In: INTERSPEECH, vol. 2016, pp. 818–822 (2016)

    Google Scholar 

  20. Woo, R.H., Park, A., Hazen, T.J.: The MIT mobile device speaker verification corpus: data collection and preliminary experiments. In: The Speaker and Language Recognition Workshop (2006)

    Google Scholar 

  21. Godfrey, J.J., Holliman, E.C., McDaniel, J.: Switchboard: telephone speech corpus for research and development. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 517–520. IEEE (1992)

    Google Scholar 

  22. Greenberg, C.S.: The NIST year 2012 speaker recognition evaluation plan. NIST, Technical report (2012)

    Google Scholar 

  23. Poignant, J., Besacier, L., Quénot, G.: Unsupervised speaker identification in TV broadcast based on written names. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1) (2015)

    Google Scholar 

  24. Nagraniy, A., Chungy, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: INTERSPEECH (2017)

    Google Scholar 

  25. Nammous M., Saeed K.: Voice-print and text-independent speaker identification. In: International Conference on Electrical Engineering Design and Technologies—ICEEDT’07, 1 Jan 2007. International Conference on Electrical Engineering Design and Technologies—ICEEDT’08, Tunisia (2007)

    Google Scholar 

  26. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York, NY (2006)

    MATH  Google Scholar 

  27. Kusy, M., Zajdel, R.: Probabilistic neural network training procedure based on Q(0)-learning algorithm in medical data classification. Appl. Intell. 41, 837–854 (2014)

    Article  Google Scholar 

  28. Specht, D.F.: Probabilistic neural networks. Neural Netw. 3(1), 109–118 (1990)

    Article  Google Scholar 

  29. Lewicki, P., Hill, T.: Statistics: Methods and Applications: a Comprehensive Reference for Science, Industry, and Data Mining. StatSoft Inc, Tulsa, OK (2006)

    Google Scholar 

  30. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  31. Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: COLING 2016, pp. 3485–3495 (2016)

    Google Scholar 

  32. Lu, Y., Lu, C., Tang, C.-K.: Online video object detection using association LSTM. In: The IEEE International Conference on Computer Vision (ICCV), pp. 2344–2352 (2017)

    Google Scholar 

  33. Akopyan, M., Khashba, E.: Large-scale YouTube-8M video understanding with deep neural networks (2017). arXiv:1706.04488

  34. Xu, J., Chen, D., Qiu, X., Huang, X.: Cached long short-term memory neural networks for document-level sentiment classification. In: EMNLP 2016, pp. 1660–1669 (2016)

    Google Scholar 

  35. Lu, L., Renals, S.: Small-footprint highway deep neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 25(7), 1502–1511 (2017)

    Article  Google Scholar 

  36. Chen, J., Wang, D.L.: Long short-term memory for speaker generalization in supervised speech separation. In: INTERSPEECH, pp. 3314–3318 (2016)

    Google Scholar 

  37. Saeed, K., Adamski, M., Bhattasali, T., Nammous, M.K., Panasiuk, P., Rybnik, M., Shaikh, S.H.: New Directions in Behavioral Biometrics. CRC Press (2016)

    Google Scholar 

Download references

Acknowledgements

This work was supported by grant S/WI/3/2018 from Bialystok University of Technology and funded with resources for research by the Ministry of Science and Higher Education in Poland.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad K. Nammous .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Nammous, M.K., Saeed, K. (2019). Natural Language Processing: Speaker, Language, and Gender Identification with LSTM. In: Chaki, R., Cortesi, A., Saeed, K., Chaki, N. (eds) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing, vol 883. Springer, Singapore. https://doi.org/10.1007/978-981-13-3702-4_9

Download citation

Publish with us

Policies and ethics