Skip to main content

Identification of Seven Low-Resource North-Eastern Languages: An Experimental Study

  • Chapter
  • First Online:
Intelligence Enabled Research

Abstract

This paper describes an experimental study on the identification of seven north-eastern (NE) low-resource languages (LRL) of India namely, Assamese, Bengali, Hindi, Manipuri, Mizo, Nagamese, and Nepali using 55-dimensional hybrid features (HF) and hierarchical technique. However, the development of language identification (LID) systems could be leveraged only with the availability of specially curated speech data in LRL and it is a really challenging task to build a system on such under-resourced languages. We have collected around 42 h of speech data (including 35 h for training and 7 h for testing) for analysis on the above-said seven LRL. The process of designing speech database in LRL has been generic enough to be used for other languages as well. We have compared our proposed methodology with baseline system on collected speech data. From the experimental study, it has been observed that our proposed system is outperformed over baseline system and results are encouraging for researchers in low-resource languages. This initial study unveils the importance of HF for NE-LRL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Y.K. Muthusamy, E. Barnard, R.A. Cole, Reviewing automatic language identification. IEEE Signal Process. Mag. 11(4), 33–41 (1994)

    Article  Google Scholar 

  2. D. Jurafsky, J. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, 2nd edn. (Prentice Hall, New Jersey, 2008)

    Google Scholar 

  3. L. Rabiner, B. Juang, Fundamentals of Speech Recognition (Prentice Hall, New Jersey, 1993)

    Google Scholar 

  4. A. Itchikawa, Y. Nakano, Nakata, Evaluation of various parameter sets in spoken digits recognition. IEEE Trans. Audio Electro-acoust AU-21(3) (1973)

    Google Scholar 

  5. J.T. Foil, Language identification using noisy speech, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (1986), pp. 861–864

    Google Scholar 

  6. L. Besacier, E. Barnard, A. Karpov, T. Schultz, Automatic speech recognition for under-resourced languages: a survey. Speech Commun. 56, 85–100 (2014)

    Article  Google Scholar 

  7. http://mhrd.gov.in/language-education

  8. Languages of India, https://en.wikipedia.org/wiki/Languages_of_India

  9. J. Basu et al., Acoustic analysis of vowels in five low resource North East Indian languages of Nagaland, in O-COCOSDA (Seoul, Korea (South), 2017), pp. 145–150 (2017)

    Google Scholar 

  10. Praat Website (2016), http://www.fon.hum.uva.nl/praat/

  11. T. Soni, J.R. Zeidler, W.H. Ku, Behavior of the partial correlation coefficients of a least squares lattice filter in the presence of a nonstationary chirp input. IEEE Trans. Signal Process. 43(4), 852–863 (1995)

    Article  Google Scholar 

  12. Y. Linde, A. Buzo, R.M. Gray, An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)

    Article  Google Scholar 

  13. D.A. Reynolds, R.C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 72–83 (1995)

    Article  Google Scholar 

  14. A. Dempstar, N. Larid, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm. R. Stat. Soc. B 39, 1–38 (1977)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

Authors would like to acknowledge the contribution of NERIST, Arunachal Pradesh for supporting speech data collection from native speakers of north-east Indian. The authors are thankful to CDAC, Kolkata, India for necessary financial and infrastructural support to carry out research activity.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joyanta Basu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Basu, J., Majumder, S. (2020). Identification of Seven Low-Resource North-Eastern Languages: An Experimental Study. In: Bhattacharyya, S., Mitra, S., Dutta, P. (eds) Intelligence Enabled Research. Advances in Intelligent Systems and Computing, vol 1109. Springer, Singapore. https://doi.org/10.1007/978-981-15-2021-1_9

Download citation

Publish with us

Policies and ethics