Abstract
Precise identification of the language from speech utterance is a prime task of a language identification system and has been extensively utilized in multilanguage speech applications. This article presents Indian language identification system using textural descriptors extracted from time-frequency visual representation. The conventional LPC and MFCC feature extraction approaches for language identification have limited detection accuracy. In the first step, an input speech signal is converted into spectrogram, MFCC and cochleagram images representation. These speech sample visual representations can be treated as a texture image characterizing energy variations in different frequency-bands over time. Second step comprises extraction of completed linear binary pattern (CLBP), linear phase quantization (LPQ) and Weber local descriptor (WLD) textural features from visual representations. Finally, the kernel extreme learning machine (KELM) classifier has been employed for the language specific class label identification. The proposed algorithm validation is carried out using the IIIT-H Indic speech databases incorporating seven Indian languages from Indo-Aryan and Dravidian family. It is evident from the experimental results that the proposed time-frequency texture descriptor method outperforms other machine learning algorithms.
Similar content being viewed by others
Data Availability Statement
The data underlying this article were provided by IIIT-H Indic Speech Databases by permission.
References
Aarti B, Kopparapu SK (2017) Spoken Indian language classification using artificial neural network—an experimental study. In: 2017 4th International Conference on signal processing and integrated networks (SPIN), pp 424–430. https://doi.org/10.1109/SPIN.2017.8049987
All India radio (2021) All India radio news services division. https://newsonair.gov.in/RNU-NSD-Audio-Archive-Search.aspx. Accessed 21 Feb 2021
Anjana JS, Poorna SS (2018) Language Identification From Speech Features Using SVM and LDA. In: 2018 International Conference on wireless communications, signal processing and networking (WiSPNET), pp 1–4. https://doi.org/10.1109/WiSPNET.2018.8538638
Anjanendu C, George A, Mary L (2018) Language identification using gender dependent GMM-UBM for three Indian languages. In: 2018 2nd International Conference on trends in electronics and informatics (ICOEI), IEEE, pp 510–513. IEEE. https://doi.org/10.1109/ICOEI.2018.8553783
Bagi R, Yadav J (2016) Performance degradation of language identification system in noisy environment, pp 538–548. https://doi.org/10.1142/9789814704830_0051
Bagi R, Yadav J, Rao KS (2015) Improved recognition rate of language identification system in noisy environment. In: 2015 Eighth International Conference on contemporary computing (IC3), IEEE, pp 214–219. IEEE. https://doi.org/10.1109/IC3.2015.7346681
Bakshi A, Kumar KS (2018) Spoken Indian language identification: a review of features and databases. Sādhanā 43(4):53. https://doi.org/10.1007/s12046-018-0841-y
Basu J, Khan S, Roy R, Basu TK, Majumder S (2021) Multilingual speech corpus in low-resource eastern and northeastern Indian languages for speaker and language identification. Circ Syst Signal Process 40:4986–5013. https://doi.org/10.1007/s00034-021-01704-x
Bhanja CC, Bisharad D, Laskar RH (2019a) Deep residual networks for pre-classification based Indian language identification. J Intell Fuzzy Syst 36(3):2207–2218. https://doi.org/10.3233/JIFS-169932
Bhanja CC, Laskar MA, Laskar RH, Bandyopadhyay S (2019b) Deep neural network based two-stage Indian language identification system using glottal closure instants as anchor points. J King Saud Univ-Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2019.07.001
Birajdar GK, Patil MD (2020) Speech/music classification using visual and spectral chromagram features. J Ambient Intell Humaniz Comput 11(1):329–347. https://doi.org/10.1007/s12652-019-01303-4
Census of India (2011) Census of India/Abstract of speakers strength of languages (2011). http://www.censusindia.gov.in/Census_Data_2001/Census_Data_Online/Language/Statement1.aspx. Accessed 21 Feb 2021
Chen J, Shan S, He C, Zhao G, Pietikäinen M, Chen X, Gao W (2010) WLD: A robust local image descriptor. IEEE Trans Pattern Anal Mach Intell 32(9):1705–1720. https://doi.org/10.1109/TPAMI.2009.155
China Bhanja C, Laskar MA, Laskar RH (2019) A pre-classification-based language identification for Northeast Indian languages using prosody and spectral features. Circ Syst Signal Process 38(5):2266–2296. https://doi.org/10.1007/s00034-018-0962-x
Chowdhury AA, Borkar VS, Birajdar GK (2020) Indian language identification using time-frequency image textural descriptors and gwo-based feature selection. J Exp Theoret Artif Intell 32(1):111–132. https://doi.org/10.1080/0952813X.2019.1631392
Das HS, Roy P (2019) Optimal prosodic feature extraction and classification in parametric excitation source information for Indian language identification using neural network based Q-learning algorithm. Int J Speech Technol 22(1):67–77. https://doi.org/10.1007/s10772-018-09582-6
Das HS, Roy P (2020) Bottleneck feature-based hybrid deep autoencoder approach for Indian language identification. Arab J Sci Eng 45(4):3425–3436
Dennis JW, Dat TH, Li H (2011) Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Process Lett 18(2):130–133. https://doi.org/10.1109/LSP.2010.2100380
Dutta AK, Rao KS (2018) Language identification using phase information. Int J Speech Technol 21(3):509–519. https://doi.org/10.1007/s10772-017-9482-5
Garain A, Singh PK, Sarkar R (2021) Fuzzygcp: A deep learning architecture for automatic spoken language identification from speech signals. Expert Syst Appl 168:114416. https://doi.org/10.1016/j.eswa.2020.114416
Godbole Shubham, Jadhav V, Birajdar G (2020) Indian language identification using deep learning. ITM Web Conf 32:01010. https://doi.org/10.1051/itmconf/20203201010
Guo Z, Zhang L, Zhang D (2010) A completed modeling of local binary pattern operator for texture classification. IEEE Trans Image Process 19(6):1657–1663. https://doi.org/10.1109/TIP.2010.2044957
Gupta M, Bharti S.S, Agarwal S (2017) Implicit language identification system based on random forest and support vector machine for speech. In: 2017 4th International Conference on power, control & embedded systems (ICPCES), IEEE, pp 1–6. IEEE. https://doi.org/10.1109/ICPCES.2017.8117624
Gupta K, Gour K.S, Arya S, Gangashetty S.V (2018) Decision level fusion based approach for indian languages identification using deep neural network. In: TENCON 2018-2018 IEEE Region 10 Conference, IEEE, pp 2056–2059. IEEE. https://doi.org/10.1109/TENCON.2018.8650227
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501. https://doi.org/10.1016/j.neucom.2005.12.126 (Neural Networks)
Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern 42(2):513–529. https://doi.org/10.1109/TSMCB.2011.2168604
Jog AH, Jugade OA, Kadegaonkar AS, Birajdar GK (2018) Indian language identification using cochleagram based texture descriptors and ANN classifier. In: 2018 15th IEEE India Council International Conference (INDICON), IEEE, pp 1–6. IEEE
Jothilakshmi S, Ramalingam V, Palanivel S (2012) A hierarchical language identification system for Indian languages. Digit Signal Process 22(3):544–553. https://doi.org/10.1016/j.dsp.2011.11.008
Koolagudi S, Deepika R, Sreenivasa RK (2012) Identification of language using mel-frequency cepstral coefficients (MFCC). Proc Eng 38:3391–3398. https://doi.org/10.1016/j.proeng.2012.06.392
Madhu C, George A, Mary L (2017) Automatic language identification for seven Indian languages using higher level features. In: 2017 IEEE International Conference on signal processing, informatics, communication and energy systems (SPICES), IEEE, pp 1–6. IEEE. https://doi.org/10.1109/SPICES.2017.8091332
Manwani N, Mitra S.K, Joshi M.V (2007) Spoken language identification for Indian languages using split and merge EM algorithm In: International Conference on pattern recognition and machine intelligence, Springer, pp 463–468. Springer. https://doi.org/10.1007/978-3-540-77046-6_57
Mukherjee H, Ghosh S, Sen S, Sk MdO, Santosh KC, Phadikar S, Roy K (2019) Deep learning for spoken language identification: Can we visualize speech signal patterns? Neural Comput Appl 31(12):8483–8501. https://doi.org/10.1007/s00521-019-04468-3
Mukherjee H, Das S, Dhar A, Obaidullah SM, Santosh KC, Phadikar S, Roy K (2020) An ensemble learning-based language identification system. In: Maharatna K, Kanjilal M, Konar S, Nandi S, Das K (eds) Computational advancement in communication circuits and systems. Lecture notes in electrical engineering, vol 575. Springer, Singapore, pp 129–138. https://doi.org/10.1007/978-981-13-8687-9_12
Nandi D, Pati D, Rao KS (2015) Implicit excitation source features for robust language identification. Int J Speech Technol 18(3):459–477. https://doi.org/10.1007/s10772-015-9288-2
Nandi D, Pati D, Rao KS (2017) Parametric representation of excitation source information for language identification. Comput Speech Lang 41:88–115. https://doi.org/10.1016/j.csl.2016.05.001
Nanni L, Costa YMG, Lucio DR, Silla CN, Brahnam S (2017) Combining visual and acoustic features for audio classification tasks. Pattern Recogn Lett 88:49–56. https://doi.org/10.1016/j.patrec.2017.01.013
Ojansivu V, Heikkilä J (2008) Blur insensitive texture classification using local phase quantization. In: Elmoataz A, Lezoray O, Nouboud F, Mammass D (eds) Image and signal processing. Springer, Berlin, Heidelberg, pp 236–243. https://doi.org/10.1007/978-3-540-69905-7_27
Patterson RD, Robinson K, Holdsworth J, McKeown D, Zhang C, Allerhand M (1992) Complex sounds and auditory images. In: Cazals Y, Horner K, Demany L (eds) Auditory physiology and perception. Pergamon, pp 429–446. https://doi.org/10.1016/B978-0-08-041847-6.50054-X
Polasi PK, Krishna KSR (2016) Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise. Int J Speech Technol 19(1):75–85. https://doi.org/10.1007/s10772-015-932-0
Polasi PK, Sri Rama Krishna K (2016) Performance of speaker independent language identification system under various noise environments. In: Satapathy SC, Mandal JK, Udgata SK, Bhateja V (eds) Information systems design and intelligent applications. Springer, New Delhi, pp 315–320. https://doi.org/10.1007/978-81-322-2755-7_33
Prahallad K, Kumar EN, Keri V, Rajendran S, Black AW (2012) The IIIT-H Indic speech databases. In: Proceedings of interspeech, Interspeech. Interspeech, pp. 2546–2549
Rai MK, Fahad MS, Yadav J, Rao KS, et al (2016) Language identification using plda based on i-vector in noisy environment. In: 2016 International Conference on advances in computing, communications and informatics (ICACCI), IEEE, pp. 1014–1020. IEEE
Rao KS, Reddy VR, Maity S (2015) Language identification using spectral and prosodic features. Springer, Cham, p 3319171623
Reddy VR, Maity S, Rao KS (2013) Identification of Indian languages using multi-level spectral and prosodic features. Int J Speech Technol 16(4):489–511. https://doi.org/10.1007/s10772-013-9198-0
Revathi A, Jeyalakshmi C, Muruganantham T (2018) Perceptual features based rapid and robust language identification system for various Indian classical languages. In: Hemanth DJ, Smys S (eds) Computational vision and bio inspired computing. Springer, Cham, pp 291–305. https://doi.org/10.1007/978-3-319-71767-8_25
Sharan RV, Moir TJ (2015) Noise robust audio surveillance using reduced spectrogram image feature and one-against-all svm. Neurocomputing 158:90–99. https://doi.org/10.1016/j.neucom.2015.02.001
Sharan RV, Moir TJ (2016) An overview of applications and advancements in automatic sound recognition. Neurocomputing 200:22–34. https://doi.org/10.1016/j.neucom.2016.03.020
Sharan RV, Moir TJ (2019) Acoustic event recognition using cochleagram image and convolutional neural networks. Appl Acoust 148:62–66. https://doi.org/10.1016/j.apacoust.2018.12.006
Verma VK, Khanna N (2013) Indian language identification using k-means clustering and support vector machine (SVM). In:2013 Students Conference on engineering and systems (SCES), IEEE, pp 1–5. IEEE. https://doi.org/10.1109/SCES.2013.6547533
Wang M, Chen H, Li H, Cai Z, Zhao X, Tong C, Li J, Xu X (2017) Grey wolf optimization evolving kernel extreme learning machine: Application to bankruptcy prediction. Eng Appl Artif Intell 63:54–68. https://doi.org/10.1016/j.engappai.2017.05.003
Xie J, Zhu M (2019) Handcrafted features and late fusion with deep learning for bird sound classification. Eco Inf 52:74–81. https://doi.org/10.1016/j.ecoinf.2019.05.007
Yang W, Krishnan S (2017) Combining temporal features by local binary pattern for acoustic scene classification. IEEE/ACM Trans Audio Speech Lang Process 25(6):1315–1321. https://doi.org/10.1109/TASLP.2017.2690558
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Birajdar, G.K., Raveendran, S. Indian language identification using time-frequency texture features and kernel ELM. J Ambient Intell Human Comput 14, 13237–13250 (2023). https://doi.org/10.1007/s12652-022-03781-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-03781-5