Abstract
This work presents the elements of language identification (LID) in small segments created using short duration utterances. For low-resourced languages availability of data itself is a challenge. The paper tries to apply DNN for low resourced language. This paper presents a feed-forward deep neural network (FF-DNN) for language identification using acoustic features of short-time utterances. Two network topologies for DNN have been checked for their performance in LID task. The obtained findings of the experiments are compared to a well-established technique based on i-vector system. This i-vector system uses MFCC-SDC to represent speech feature that represent the acoustic characteristics and the back end is implemented using support vector machine (SVM) that serves as a classifier. These mechanisms were put in place to help with identification of Hindi and Punjabi, two widely spoken Indian languages. The speech utterances are divided into short segments of 5 s, 10 s, 20 s and 35-s duration. The system’s efficiency is measured in EER (%) and for short time segments, a relative improvement of 3% is achieved by the DNN system, whereas the average error rate overall the utterances was decreased by 2% using DNN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bansal, P.: Amita dev and Shail Bala Jain, “Automatic speaker identification using Mel-frequency cepstral coefficients.” Pb. Univ. Res. J (Sci.) 59, 165–168 (2009)
Bansal, P., Dev, A., Shail Bala, J.: Automatic speaker identification using vector quantization. Asian J. Inf. Technol. 6(9), 938–942 (2007)
Besacier, L., Barnard, E., Karpov, A., Schultz, T.: Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)
Campbell, J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Poonam, B., Amita, D., Shail, B.J.: Automatic speaker identification using vector quantization. Asian J. Inf. Technol. 6(9) 938–942 (2007)
Kumari, R., Dev, A., Kumar, A.: An efficient adaptive artificial neural network based text to speech synthesizer for Hindi language. Multimedia Tools Appl. 80(16), 24669–24695 (2021). https://doi.org/10.1007/s11042-021-10771-w
Pitrelli, J.F., Bakis, R., Eide, E.M., Fernandez, R., Hamza, W., Picheny, M.A.: The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Audio Speech Lang. Process. 14(4), 1099–1108 (2006)
Rajesh, M.H., Hema, A.M.: Automatic language identification and discrimination using the modified group delay feature. In: Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, pp. 395–399. IEEE (2005)
Song, Y., Hong, X., Jiang, B., Cui, R., McLoughlin, I., Dai, L-R.: Deep bottleneck network based i-vector representation for language identification. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Br¨ummer, N., et al.: Description and analysis of the brno276 system for lre2011. In: Odyssey 2012-the speaker and language recognition workshop (2012)
Haizhou, L., Bin, M., Kong, A.L.: Spoken language recognition: from fundamentals to practice. Proc. IEEE 101(5), 1136–1159 (2013)
Lopez-Moreno, I., et al.: Automatic language identification using deep neural networks. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5337–5341. IEEE (2014)
Ambikairajah, E., Li, H., Wang, L., Yin, B., Sethu, V.: Lang. Ident. Tutorial. IEEE Circuits Syst. Mag. 11(2), 82–108 (2011)
Zissman, M.A.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31 (1996)
Torres-Carrasquillo, P.A., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A., Deller Jr, J.R.: Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In: Seventh international conference on spoken language processing (2002)
Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell, W.M., Reynolds, D.A.: Acoustic, phonetic, and discriminative approaches to automatic language identification. In: Eighth European Conference on Speech Communication and Technology (2003)
Lopez-Moreno, I., Gonzalez-Dominguez, J., Martinez, D., Plchot, O., Gonzalez-Rodriguez, J., Moreno, P.J.: On the use of deep feedforward neural networks for automatic language identification. Comput. Speech Lang. 40, 46–59 (2016)
Richardson, F., Reynolds, D., Dehak, N.: Deep neural network approaches to speaker and language recognition. IEEE Signal Process. Lett. 22(10), 1671–1675 (2015)
Montavon, G.: Deep learning for spoken language identification. In: NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, pp. 1–4. Whistler, Canada (2009)
Sinha, S., Jain, A., Agrawal, S.S.: Empirical analysis of linguistic and paralinguistic information for automatic dialect classification. Artif. Intell. Rev. 51(4), 647–672 (2017). https://doi.org/10.1007/s10462-017-9573-3
Watanabe, S., Hori, T., Hershey, J.R.: Language independent end-to-end architecture for joint language identification and speech recognition. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 265–271. IEEE (2017)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Sinha, S., Agrawal, S.S. (2022). Deep Neural Networks for Spoken Language Identification in Short Utterances. In: Dev, A., Agrawal, S.S., Sharma, A. (eds) Artificial Intelligence and Speech Technology. AIST 2021. Communications in Computer and Information Science, vol 1546. Springer, Cham. https://doi.org/10.1007/978-3-030-95711-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-95711-7_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95710-0
Online ISBN: 978-3-030-95711-7
eBook Packages: Computer ScienceComputer Science (R0)