Abstract
This paper describes an experimental study on the identification of seven north-eastern (NE) low-resource languages (LRL) of India namely, Assamese, Bengali, Hindi, Manipuri, Mizo, Nagamese, and Nepali using 55-dimensional hybrid features (HF) and hierarchical technique. However, the development of language identification (LID) systems could be leveraged only with the availability of specially curated speech data in LRL and it is a really challenging task to build a system on such under-resourced languages. We have collected around 42 h of speech data (including 35 h for training and 7 h for testing) for analysis on the above-said seven LRL. The process of designing speech database in LRL has been generic enough to be used for other languages as well. We have compared our proposed methodology with baseline system on collected speech data. From the experimental study, it has been observed that our proposed system is outperformed over baseline system and results are encouraging for researchers in low-resource languages. This initial study unveils the importance of HF for NE-LRL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Y.K. Muthusamy, E. Barnard, R.A. Cole, Reviewing automatic language identification. IEEE Signal Process. Mag. 11(4), 33–41 (1994)
D. Jurafsky, J. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, 2nd edn. (Prentice Hall, New Jersey, 2008)
L. Rabiner, B. Juang, Fundamentals of Speech Recognition (Prentice Hall, New Jersey, 1993)
A. Itchikawa, Y. Nakano, Nakata, Evaluation of various parameter sets in spoken digits recognition. IEEE Trans. Audio Electro-acoust AU-21(3) (1973)
J.T. Foil, Language identification using noisy speech, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (1986), pp. 861–864
L. Besacier, E. Barnard, A. Karpov, T. Schultz, Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)
Languages of India, https://en.wikipedia.org/wiki/Languages_of_India
J. Basu et al., Acoustic analysis of vowels in five low resource North East Indian languages of Nagaland, in O-COCOSDA (Seoul, Korea (South), 2017), pp. 145–150 (2017)
Praat Website (2016), http://www.fon.hum.uva.nl/praat/
T. Soni, J.R. Zeidler, W.H. Ku, Behavior of the partial correlation coefficients of a least squares lattice filter in the presence of a nonstationary chirp input. IEEE Trans. Signal Process. 43(4), 852–863 (1995)
Y. Linde, A. Buzo, R.M. Gray, An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)
D.A. Reynolds, R.C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 72–83 (1995)
A. Dempstar, N. Larid, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm. R. Stat. Soc. B 39, 1–38 (1977)
Acknowledgements
Authors would like to acknowledge the contribution of NERIST, Arunachal Pradesh for supporting speech data collection from native speakers of north-east Indian. The authors are thankful to CDAC, Kolkata, India for necessary financial and infrastructural support to carry out research activity.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Basu, J., Majumder, S. (2020). Identification of Seven Low-Resource North-Eastern Languages: An Experimental Study. In: Bhattacharyya, S., Mitra, S., Dutta, P. (eds) Intelligence Enabled Research. Advances in Intelligent Systems and Computing, vol 1109. Springer, Singapore. https://doi.org/10.1007/978-981-15-2021-1_9
Download citation
DOI: https://doi.org/10.1007/978-981-15-2021-1_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2020-4
Online ISBN: 978-981-15-2021-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)