Skip to main content

ASR for Indian Regional Languages Using Fine-Tuned Wav2Vec2 Model

  • Conference paper
  • First Online:
Advances in Data Science and Computing Technologies (ADSC 2022)

Abstract

Recent technologies have pertained to huge success in the field of speech recognition and natural language processing. Numerous models are used for fine-tuning on large pre-trained models. Such models are of huge help when training languages with vast resources. This is not the case for languages that have comparatively fewer resources. In this paper, an automatic speech recognition model for low resourced Indian regional languages is proposed. The model used in this paper is the renowned Wav2Vec2 model. This paper will focus on the Tamil dataset. The objective of this paper is to minimize the word error rate (WER) metric when transcribing some audio file (e.g., a WAV file) containing speech. This is achieved by pre-training Wav2Vec2. Later, fine tuning it by the addition of a linear layer on top and feeding a custom tokenizer. This approach achieves 61.3% WER on the Mozilla Common Voice dataset. This outperforms previously gained WER of 69.76.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bachate RP, Sharma A (2019) Automatic speech recognition systems for regional languages in India. Int J Recent Technol Eng 8(2)

    Google Scholar 

  2. Javed T, Doddapaneni S, Raman A, Bhogale KS, Ramesh G, Kunchukuttan A, Kumar P, Khapra MM (12 Nov 2021) Towards building ASR systems for next billion users

    Google Scholar 

  3. Schneider S, Baevski A, Collobert R (11 Sept 2019) Michael Auli for Facebook AI research Wav2Vec: unsupervised pre-training for speech recognition

    Google Scholar 

  4. Povey D (Oct 2013) kaldi-asr.org

    Google Scholar 

  5. Dr. Thamburaj KP, Dr. Ponniah K, Dr. Sivanathan I, Dr. Kumar M (May 2021) A process of developing an ASR system for Malay and Tamil languages, researchgate.net

    Google Scholar 

  6. Baevski A, Zhou H, Mohamed A, Auli M (20 June 2020) wav2vec 2.0: a framework for self-supervised learning of speech representations. arxiv.org/abs/2006.11477

    Google Scholar 

  7. Common Voice Corpus 7.0 Version: ta_216h_2021–07–21, Mozilla, July 21, 2021. [Online]. Available: https://commonvoice.mozilla.org/en/datasets

  8. Chung Y-A, Zhang Y, Han W, Chiu C-C, Qin J, Pang R, Wu Y (7 Aug, 2021) W2v-BERT: combining contrastive learning and masked language modeling for self-supervised speech pre-training. arxiv.org/abs/2108.06209

    Google Scholar 

  9. Baevski A, Auli M, Conneau A (24 Sept 2020) Wav2vec2 2.0—learning the structure of speech from raw audio, Meta AI

    Google Scholar 

  10. Conneau A, Baevski A, Collobert R, Mohamed A, Auli M (15 Dec 2020) Unsupervised cross-lingual representation learning for speech recognition, Facebook AI

    Google Scholar 

  11. Papers With Code https://paperswithcode.com/sota/speech-recognition-on-common-voice-tamil. Accessed 2021

  12. Hanunan A (27 Nov 2017) Sequence modelling with CTC. Standford University

    Google Scholar 

  13. Heafield K (July 2011) KenLM: faster and smaller language model queries, reseachgate.net

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khushi Jhanwar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ghadekar, P., Jhanwar, K., Karpe, A., Sivanandan, A., Shetty, T., Khushalani, P. (2023). ASR for Indian Regional Languages Using Fine-Tuned Wav2Vec2 Model. In: Chakraborty, B., Biswas, A., Chakrabarti, A. (eds) Advances in Data Science and Computing Technologies. ADSC 2022. Lecture Notes in Electrical Engineering, vol 1056. Springer, Singapore. https://doi.org/10.1007/978-981-99-3656-4_14

Download citation

Publish with us

Policies and ethics