Skip to main content

Comparison of Deep Learning Methods for Spoken Language Identification

Part of the Lecture Notes in Computer Science book series (LNAI,volume 12335)

Abstract

In this paper, we implement and compare deep learning based spoken language identification models. We also deploy two very recent and popular speech recognition methods, namely Wav2Vec and SpecAugment, in our classifiers and test if they are also applicable to the field of language identification. Out of the models we implement, X-vector based deep feed forward network classifier obtains the highest F1-score of 0.91, where the target set consists of five languages. SpecAugment data augmentation method turns out to increase the classification accuracy when applied to the input mel-spectrograms of the CRNN architecture. Although they obtain lower classification accuracies than some of the other methods, Wav2Vec speech representations also achieve promising results.

Keywords

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bartz, C., et al.: Language identification using deep convolutional recurrent neural networks. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) International Conference on Neural Information Processing, pp. 1–13. Springer, Cham (2016)

    Google Scholar 

  2. Song, Y., et al.: Ivector representation based on bottleneck features for language identification. Electron. Lett. 49(24), 1569–1570 (2013)

    Article  Google Scholar 

  3. Snyder, D., et al.: Spoken Language Recognition using X-vectors. Odyssey (2018)

    Google Scholar 

  4. Schneider, S., et al.: wav2vec: unsupervised pre-training for speech recognition. arXiv preprint arXiv:1904.05862 (2019)

  5. Park, D.S., et al.: Specaugment: a simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779 (2019)

  6. Lozano-Diez, A., et al.: An end-to-end approach to language identification in short utterances using convolutional neural networks. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  7. Snyder, D., et al.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2018)

    Google Scholar 

  8. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems (2019)

    Google Scholar 

  9. Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. No. CONF. IEEE Signal Processing Society (2011)

    Google Scholar 

  10. Ott, M., et al.: Fairseq: a fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019)

  11. SRE16 Xvector Model. https://kaldi-asr.org/models/m. Accessed 14 June 2020

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Haznedaroglu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Korkut, C., Haznedaroglu, A., Arslan, L. (2020). Comparison of Deep Learning Methods for Spoken Language Identification. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60276-5_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60275-8

  • Online ISBN: 978-3-030-60276-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics