Identification of Seven Low-Resource North-Eastern Languages: An Experimental Study

Basu, Joyanta; Majumder, Swanirbhar

doi:10.1007/978-981-15-2021-1_9

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1109))

241 Accesses
4 Citations

Abstract

This paper describes an experimental study on the identification of seven north-eastern (NE) low-resource languages (LRL) of India namely, Assamese, Bengali, Hindi, Manipuri, Mizo, Nagamese, and Nepali using 55-dimensional hybrid features (HF) and hierarchical technique. However, the development of language identification (LID) systems could be leveraged only with the availability of specially curated speech data in LRL and it is a really challenging task to build a system on such under-resourced languages. We have collected around 42 h of speech data (including 35 h for training and 7 h for testing) for analysis on the above-said seven LRL. The process of designing speech database in LRL has been generic enough to be used for other languages as well. We have compared our proposed methodology with baseline system on collected speech data. From the experimental study, it has been observed that our proposed system is outperformed over baseline system and results are encouraging for researchers in low-resource languages. This initial study unveils the importance of HF for NE-LRL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Y.K. Muthusamy, E. Barnard, R.A. Cole, Reviewing automatic language identification. IEEE Signal Process. Mag. 11(4), 33–41 (1994)
Article Google Scholar
D. Jurafsky, J. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, 2nd edn. (Prentice Hall, New Jersey, 2008)
Google Scholar
L. Rabiner, B. Juang, Fundamentals of Speech Recognition (Prentice Hall, New Jersey, 1993)
Google Scholar
A. Itchikawa, Y. Nakano, Nakata, Evaluation of various parameter sets in spoken digits recognition. IEEE Trans. Audio Electro-acoust AU-21(3) (1973)
Google Scholar
J.T. Foil, Language identification using noisy speech, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (1986), pp. 861–864
Google Scholar
L. Besacier, E. Barnard, A. Karpov, T. Schultz, Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)
Article Google Scholar
http://mhrd.gov.in/language-education
Languages of India, https://en.wikipedia.org/wiki/Languages_of_India
J. Basu et al., Acoustic analysis of vowels in five low resource North East Indian languages of Nagaland, in O-COCOSDA (Seoul, Korea (South), 2017), pp. 145–150 (2017)
Google Scholar
Praat Website (2016), http://www.fon.hum.uva.nl/praat/
T. Soni, J.R. Zeidler, W.H. Ku, Behavior of the partial correlation coefficients of a least squares lattice filter in the presence of a nonstationary chirp input. IEEE Trans. Signal Process. 43(4), 852–863 (1995)
Article Google Scholar
Y. Linde, A. Buzo, R.M. Gray, An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)
Article Google Scholar
D.A. Reynolds, R.C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 72–83 (1995)
Article Google Scholar
A. Dempstar, N. Larid, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm. R. Stat. Soc. B 39, 1–38 (1977)
MathSciNet Google Scholar

Download references

Acknowledgements

Authors would like to acknowledge the contribution of NERIST, Arunachal Pradesh for supporting speech data collection from native speakers of north-east Indian. The authors are thankful to CDAC, Kolkata, India for necessary financial and infrastructural support to carry out research activity.

Author information

Authors and Affiliations

CDAC, Kolkata, Salt Lake, Sector - V, Kolkata, 700091, India
Joyanta Basu
Department of Information Technology, Tripura University, Suryamaninagar, 799022, Tripura, India
Swanirbhar Majumder

Authors

Joyanta Basu
View author publications
You can also search for this author in PubMed Google Scholar
Swanirbhar Majumder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joyanta Basu .

Editor information

Editors and Affiliations

RCC Institute of Information Technology, Kolkata, India
Siddhartha Bhattacharyya
Indian Statistical Institute, Kolkata, India
Sushmita Mitra
Visva-Bharati University, Santiniketan, India
Paramartha Dutta

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Basu, J., Majumder, S. (2020). Identification of Seven Low-Resource North-Eastern Languages: An Experimental Study. In: Bhattacharyya, S., Mitra, S., Dutta, P. (eds) Intelligence Enabled Research. Advances in Intelligent Systems and Computing, vol 1109. Springer, Singapore. https://doi.org/10.1007/978-981-15-2021-1_9

Download citation

DOI: https://doi.org/10.1007/978-981-15-2021-1_9
Published: 05 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2020-4
Online ISBN: 978-981-15-2021-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics