Skip to main content

Language Identification Using Deep Convolutional Recurrent Neural Networks

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10639))

Included in the following conference series:

Abstract

Language Identification (LID) systems are used to classify the spoken language from a given audio sample and are typically the first step for many spoken language processing tasks, such as Automatic Speech Recognition (ASR) systems. Without automatic language detection, speech utterances cannot be parsed correctly and grammar rules cannot be applied, causing subsequent speech recognition steps to fail. We propose a LID system that solves the problem in the image domain, rather than the audio domain. We use a hybrid Convolutional Recurrent Neural Network (CRNN) that operates on spectrogram images of the provided audio snippets. In extensive experiments we show, that our model is applicable to a range of noisy scenarios and can easily be extended to previously unknown languages, while maintaining its classification accuracy. We release our code and a large scale training set for LID systems to the community.

C.Bartz and T. Herold—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.apple.com/ios/siri/.

  2. 2.

    https://assistant.google.com/.

  3. 3.

    https://github.com/HPI-DeepLearning/crnn-lid.

  4. 4.

    https://webgate.ec.europa.eu/sr/.

  5. 5.

    https://www.youtube.com/user/bbcnews.

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems arXiv:1603.04467 (2016)

  2. Blackman, R.B., Tukey, J.W.: The measurement of power spectra from the point of view of communications engineering-part I. Bell Labs Tech. J. 37(1), 185–282 (1958)

    Article  Google Scholar 

  3. Chollet, F.: keras: deep learning library for python. Runs on TensorFlow, Theano or CNTK (2017). https://github.com/fchollet/keras

  4. Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  5. Gelly, G., Gauvain, J.L., Le, V., Messaoudi, A.: A divide-and-conquer approach for language identification based on recurrent neural networks. In: INTERSPEECH 2016, pp. 3231–3235 (2016)

    Google Scholar 

  6. Gelly, G., Gauvain, J.L., Lamel, L., Laurent, A., Le, V.B., Messaoudi, A.: Language Recognition for Dialects and Closely Related Languages. Odyssey, Bilbao (2016)

    Book  Google Scholar 

  7. Gonzalez-Dominguez, J., Lopez-Moreno, I., Moreno, P.J., Gonzalez-Rodriguez, J.: Frame-by-frame language identification in short utterances using deep neural networks. Neural Netw. 64, 49–58 (2015)

    Article  Google Scholar 

  8. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 448–456 (2015)

    Google Scholar 

  9. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations, San Diego (2015)

    Google Scholar 

  10. Lozano-Dez, A., Zazo Candil, R., Gonzlez Domnguez, J., Toledano, D.T., Gonzlez-Rodrguez, J.: An end-to-end approach to language identification in short utterances using convolutional neural networks. International Speech and Communication Association (2015)

    Google Scholar 

  11. Martnez, D., Plchot, O., Burget, L., Glembek, O., Matjka, P.: Language recognition in ivectors space. In: Twelfth Annual Conference of the International Speech Communication Association (2011)

    Google Scholar 

  12. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)

    Google Scholar 

  13. Plchot, O., Matejka, P., Glembek, O., Fer, R., Novotny, O., Pesan, J., Burget, L., Brummer, N., Cumani, S.: Bat system description for NIST LRE 2015. Odyssey 2016, pp. 166–173 (2016)

    Google Scholar 

  14. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  15. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 91–99. Curran Associates Inc., New York (2015)

    Google Scholar 

  16. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2016)

    Article  Google Scholar 

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

    Google Scholar 

  18. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision, pp. 2818–2826 (2016)

    Google Scholar 

  19. Zazo, R., Lozano-Diez, A., Gonzalez-Dominguez, J., Toledano, D.T., Gonzalez-Rodriguez, J.: Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE 11(1), e0146917 (2016)

    Article  Google Scholar 

  20. Zissman, M.A., et al.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Bartz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bartz, C., Herold, T., Yang, H., Meinel, C. (2017). Language Identification Using Deep Convolutional Recurrent Neural Networks. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10639. Springer, Cham. https://doi.org/10.1007/978-3-319-70136-3_93

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70136-3_93

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70135-6

  • Online ISBN: 978-3-319-70136-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics