Skip to main content

Speech Signal Analysis for Language Identification Using Tensors

  • Conference paper
  • First Online:
Machine Learning, Image Processing, Network Security and Data Sciences (MIND 2020)

Abstract

Language detection is the first step in speech recognition systems. It helps these systems to use grammar and semantics of a language in a better way. Due to these reasons, active research is being carried out in language identification. Every language has specific sound patterns, rhythm, tone, nasal features, etc. We have proposed an approach based on Tensor that uses MFCCs for determining the characteristic features of a language that can be used to identify a spoken language. Tensor based algorithms perform quite well for higher dimensions and scale quite well as compared to classic maximum likelihood estimation (MLE) used in latent variable modeling. Also, this approaches does not suffer from slow convergence and require fewer data points for learning. We have conducted language identification experiments on native Indian English and Hindi for some chosen speakers, and an accuracy of around 70% is observed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://bit.ly/lowResourceSpeechDataset.

References

  1. How many languages are there in the world in 2020? (surprising results). https://www.theintrepidguide.com/how-many-languages-are-there-in-the-world/#.Xlj1vHUzZuQ. Accessed 28 Feb 2020

  2. Albawi, S., Mohammed, T.A., Al-Zawi, S.: Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET), pp. 1–6, August 2017. https://doi.org/10.1109/ICEngTechnol.2017.8308186

  3. Biemond, J., Lagendijk, R.L.: The expectation-maximization (EM) algorithm applied to image identification and restoration. In: Proceedings of the ICCON IEEE International Conference on Control and Applications, pp. 231–235, April 1989. https://doi.org/10.1109/ICCON.1989.770513

  4. Boyajian, A.: The tensor - a new engineering tool. Electr. Eng. 55(8), 856–862 (1936). https://doi.org/10.1109/EE.1936.6539021

    Article  MATH  Google Scholar 

  5. Bartz, C., Herold, T., Yang, H., Meinel, C.: Language identification using deep convolutional recurrent neural networks. arXiv preprint arXiv:1708.04811 (2017)

  6. Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998). https://doi.org/10.1109/5254.708428

    Article  Google Scholar 

  7. Hossan, M.A., Memon, S., Gregory, M.A.: A novel approach for MFCC feature extraction. In: 2010 4th International Conference on Signal Processing and Communication Systems, pp. 1–5, December 2010. https://doi.org/10.1109/ICSPCS.2010.5709752

  8. Hsu, D., Kakade, S.M.: Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Proceedings of the 4th Conference on Innovations in Theoretical Computer Science, ITCS 2013, pp. 11–20. ACM, New York (2013). https://doi.org/10.1145/2422436.2422439. http://doi.acm.org/10.1145/2422436.2422439

  9. Lei, X., Tu, G.H., Liu, A.X., Li, C.Y., Xie, T.: The insecurity of home digital voice assistants-Amazon Alexa as a case study. arXiv preprint arXiv:1712.03327 (2017)

  10. López, G., Quesada, L., Guerrero, L.A.: Alexa vs. Siri vs. Cortana vs. Google assistant: a comparison of speech-based natural user interfaces. In: Nunes, I. (ed.) AHFE 2017. AISC, vol. 592, pp. 241–250. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60366-7_23

    Chapter  Google Scholar 

  11. Madhu, C., George, A., Mary, L.: Automatic language identification for seven Indian languages using higher level features. In: 2017 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), pp. 1–6, August 2017. https://doi.org/10.1109/SPICES.2017.8091332

  12. Mohamed, O.M.M., Jaïdane-Saïdane, M.: Generalized Gaussian mixture model. In: 2009 17th European Signal Processing Conference, pp. 2273–2277, August 2009

    Google Scholar 

  13. Rabanser, S., Shchur, O., Günnemann, S.: Introduction to tensor decompositions and their applications in machine learning. arXiv preprint arXiv:1711.10781 (2017)

  14. Rabiner, L., Juang, B.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986). https://doi.org/10.1109/MASSP.1986.1165342

    Article  Google Scholar 

  15. Reynolds, D.A., Campbell, W.M., Shen, W., Singer, E.: Automatic language recognition via spectral and token based approaches. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds.) Springer Handbook of Speech Processing. SH, pp. 811–824. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_41

    Chapter  Google Scholar 

  16. Sinha, S., Jain, A., Agrawal, S.S.: Fusion of multi-stream speech features for dialect classification. CSI Trans. ICT 2(4), 243–252 (2015). https://doi.org/10.1007/s40012-015-0063-y

    Article  Google Scholar 

  17. Tierney, J.: A study of LPC analysis of speech in additive noise. IEEE Trans. Acoust. Speech Signal Process. 28(4), 389–397 (1980). https://doi.org/10.1109/TASSP.1980.1163423

    Article  Google Scholar 

  18. Torres-Carrasquillo, P.A., Reynolds, D.A., Deller, J.R.: Language identification using gaussian mixture model tokenization. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. I-757–I-760, May 2002. https://doi.org/10.1109/ICASSP.2002.5743828

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhagath Parabattina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jain, S., Parabattina, B., Das, P.K. (2020). Speech Signal Analysis for Language Identification Using Tensors. In: Bhattacharjee, A., Borgohain, S., Soni, B., Verma, G., Gao, XZ. (eds) Machine Learning, Image Processing, Network Security and Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. https://doi.org/10.1007/978-981-15-6318-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-6318-8_25

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-6317-1

  • Online ISBN: 978-981-15-6318-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics