Deep Neural Networks for Spoken Language Identification in Short Utterances

Sinha, Shweta; Agrawal, S. S.

doi:10.1007/978-3-030-95711-7_24

Shweta Sinha⁸ &
S. S. Agrawal⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1546))

Included in the following conference series:

International Conference on Artificial Intelligence and Speech Technology

955 Accesses

Abstract

This work presents the elements of language identification (LID) in small segments created using short duration utterances. For low-resourced languages availability of data itself is a challenge. The paper tries to apply DNN for low resourced language. This paper presents a feed-forward deep neural network (FF-DNN) for language identification using acoustic features of short-time utterances. Two network topologies for DNN have been checked for their performance in LID task. The obtained findings of the experiments are compared to a well-established technique based on i-vector system. This i-vector system uses MFCC-SDC to represent speech feature that represent the acoustic characteristics and the back end is implemented using support vector machine (SVM) that serves as a classifier. These mechanisms were put in place to help with identification of Hindi and Punjabi, two widely spoken Indian languages. The speech utterances are divided into short segments of 5 s, 10 s, 20 s and 35-s duration. The system’s efficiency is measured in EER (%) and for short time segments, a relative improvement of 3% is achieved by the DNN system, whereas the average error rate overall the utterances was decreased by 2% using DNN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech Signal Processing for Identification of Under-Resourced Languages

Spoken Language Identification for Native Indian Languages Using Deep Learning Techniques

Language Identification Using Time Delay Neural Network D-Vector on Short Utterances

References

Bansal, P.: Amita dev and Shail Bala Jain, “Automatic speaker identification using Mel-frequency cepstral coefficients.” Pb. Univ. Res. J (Sci.) 59, 165–168 (2009)
Google Scholar
Bansal, P., Dev, A., Shail Bala, J.: Automatic speaker identification using vector quantization. Asian J. Inf. Technol. 6(9), 938–942 (2007)
Google Scholar
Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)
Article Google Scholar
Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Google Scholar
Poonam, B., Amita, D., Shail, B.J.: Automatic speaker identification using vector quantization. Asian J. Inf. Technol. 6(9) 938–942 (2007)
Google Scholar
Kumari, R., Dev, A., Kumar, A.: An efficient adaptive artificial neural network based text to speech synthesizer for Hindi language. Multimedia Tools Appl. 80(16), 24669–24695 (2021). https://doi.org/10.1007/s11042-021-10771-w
Article Google Scholar
Pitrelli, J.F., Bakis, R., Eide, E.M., Fernandez, R., Hamza, W., Picheny, M.A.: The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Audio Speech Lang. Process. 14(4), 1099–1108 (2006)
Article Google Scholar
Rajesh, M.H., Hema, A.M.: Automatic language identification and discrimination using the modified group delay feature. In: Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, pp. 395–399. IEEE (2005)
Google Scholar
Song, Y., Hong, X., Jiang, B., Cui, R., McLoughlin, I., Dai, L-R.: Deep bottleneck network based i-vector representation for language identification. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Br¨ummer, N., et al.: Description and analysis of the brno276 system for lre2011. In: Odyssey 2012-the speaker and language recognition workshop (2012)
Google Scholar
Haizhou, L., Bin, M., Kong, A.L.: Spoken language recognition: from fundamentals to practice. Proc. IEEE 101(5), 1136–1159 (2013)
Google Scholar
Lopez-Moreno, I., et al.: Automatic language identification using deep neural networks. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5337–5341. IEEE (2014)
Google Scholar
Ambikairajah, E., Li, H., Wang, L., Yin, B., Sethu, V.: Lang. Ident. Tutorial. IEEE Circuits Syst. Mag. 11(2), 82–108 (2011)
Article Google Scholar
Zissman, M.A.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996)
Google Scholar
Torres-Carrasquillo, P.A., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A., Deller Jr, J.R.: Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In: Seventh international conference on spoken language processing (2002)
Google Scholar
Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell, W.M., Reynolds, D.A.: Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eighth European Conference on Speech Communication and Technology (2003)
Google Scholar
Lopez-Moreno, I., Gonzalez-Dominguez, J., Martinez, D., Plchot, O., Gonzalez-Rodriguez, J., Moreno, P.J.: On the use of deep feedforward neural networks for automatic language identification. Comput. Speech Lang. 40, 46–59 (2016)
Google Scholar
Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)
Article Google Scholar
Montavon, G.: Deep learning for spoken language identification. In: NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, pp. 1–4. Whistler, Canada (2009)
Google Scholar
Sinha, S., Jain, A., Agrawal, S.S.: Empirical analysis of linguistic and paralinguistic information for automatic dialect classification. Artif. Intell. Rev. 51(4), 647–672 (2017). https://doi.org/10.1007/s10462-017-9573-3
Article Google Scholar
Watanabe, S., Hori, T., Hershey, J.R.: Language independent end-to-end architecture for joint language identification and speech recognition. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 265–271. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Amity School of Engineering and Technology, Amity University Haryana, Gurugram, India
Shweta Sinha
KIIT College of Engineering, Gurugram, Haryana, India
S. S. Agrawal

Authors

Shweta Sinha
View author publications
You can also search for this author in PubMed Google Scholar
S. S. Agrawal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Indira Gandhi Delhi Technical University for Women, Delhi, India
Amita Dev
Kamrah Institute of Information Technology, Gurgaon, India
S. S. Agrawal
Indira Gandhi Delhi Technical University for Women, Delhi, India
Arun Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sinha, S., Agrawal, S.S. (2022). Deep Neural Networks for Spoken Language Identification in Short Utterances. In: Dev, A., Agrawal, S.S., Sharma, A. (eds) Artificial Intelligence and Speech Technology. AIST 2021. Communications in Computer and Information Science, vol 1546. Springer, Cham. https://doi.org/10.1007/978-3-030-95711-7_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-95711-7_24
Published: 29 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95710-0
Online ISBN: 978-3-030-95711-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Neural Networks for Spoken Language Identification in Short Utterances

Abstract

Access this chapter

Similar content being viewed by others

Speech Signal Processing for Identification of Under-Resourced Languages

Spoken Language Identification for Native Indian Languages Using Deep Learning Techniques

Language Identification Using Time Delay Neural Network D-Vector on Short Utterances

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Deep Neural Networks for Spoken Language Identification in Short Utterances

Abstract

Access this chapter

Similar content being viewed by others

Speech Signal Processing for Identification of Under-Resourced Languages

Spoken Language Identification for Native Indian Languages Using Deep Learning Techniques

Language Identification Using Time Delay Neural Network D-Vector on Short Utterances

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation