ASR for Indian Regional Languages Using Fine-Tuned Wav2Vec2 Model

Ghadekar, Premanand; Jhanwar, Khushi; Karpe, Ameya; Sivanandan, Akash; Shetty, Tanishka; Khushalani, Prannay

doi:10.1007/978-981-99-3656-4_14

Premanand Ghadekar³⁹,
Khushi Jhanwar³⁹,
Ameya Karpe³⁹,
Akash Sivanandan³⁹,
Tanishka Shetty³⁹ &
…
Prannay Khushalani³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1056))

Included in the following conference series:

International Conference on Advances in Data Science and Computing Technologies

189 Accesses

Abstract

Recent technologies have pertained to huge success in the field of speech recognition and natural language processing. Numerous models are used for fine-tuning on large pre-trained models. Such models are of huge help when training languages with vast resources. This is not the case for languages that have comparatively fewer resources. In this paper, an automatic speech recognition model for low resourced Indian regional languages is proposed. The model used in this paper is the renowned Wav2Vec2 model. This paper will focus on the Tamil dataset. The objective of this paper is to minimize the word error rate (WER) metric when transcribing some audio file (e.g., a WAV file) containing speech. This is achieved by pre-training Wav2Vec2. Later, fine tuning it by the addition of a linear layer on top and feeding a custom tokenizer. This approach achieves 61.3% WER on the Mozilla Common Voice dataset. This outperforms previously gained WER of 69.76.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech recognition model design for Sundanese language using WAV2VEC 2.0

Article 14 March 2024

Improving Automatic Speech Recognition for Non-native English with Transfer Learning and Language Model Decoding

Deep Neural Network Acoustic Model Baseline for Character-Level Transcription of Naturally Spoken Czech Language

References

Bachate RP, Sharma A (2019) Automatic speech recognition systems for regional languages in India. Int J Recent Technol Eng 8(2)
Google Scholar
Javed T, Doddapaneni S, Raman A, Bhogale KS, Ramesh G, Kunchukuttan A, Kumar P, Khapra MM (12 Nov 2021) Towards building ASR systems for next billion users
Google Scholar
Schneider S, Baevski A, Collobert R (11 Sept 2019) Michael Auli for Facebook AI research Wav2Vec: unsupervised pre-training for speech recognition
Google Scholar
Povey D (Oct 2013) kaldi-asr.org
Google Scholar
Dr. Thamburaj KP, Dr. Ponniah K, Dr. Sivanathan I, Dr. Kumar M (May 2021) A process of developing an ASR system for Malay and Tamil languages, researchgate.net
Google Scholar
Baevski A, Zhou H, Mohamed A, Auli M (20 June 2020) wav2vec 2.0: a framework for self-supervised learning of speech representations. arxiv.org/abs/2006.11477
Google Scholar
Common Voice Corpus 7.0 Version: ta_216h_2021–07–21, Mozilla, July 21, 2021. [Online]. Available: https://commonvoice.mozilla.org/en/datasets
Chung Y-A, Zhang Y, Han W, Chiu C-C, Qin J, Pang R, Wu Y (7 Aug, 2021) W2v-BERT: combining contrastive learning and masked language modeling for self-supervised speech pre-training. arxiv.org/abs/2108.06209
Google Scholar
Baevski A, Auli M, Conneau A (24 Sept 2020) Wav2vec2 2.0—learning the structure of speech from raw audio, Meta AI
Google Scholar
Conneau A, Baevski A, Collobert R, Mohamed A, Auli M (15 Dec 2020) Unsupervised cross-lingual representation learning for speech recognition, Facebook AI
Google Scholar
Papers With Code https://paperswithcode.com/sota/speech-recognition-on-common-voice-tamil. Accessed 2021
Hanunan A (27 Nov 2017) Sequence modelling with CTC. Standford University
Google Scholar
Heafield K (July 2011) KenLM: faster and smaller language model queries, reseachgate.net
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Vishwakarma Institute of Technology, Pune, Maharashtra, India
Premanand Ghadekar, Khushi Jhanwar, Ameya Karpe, Akash Sivanandan, Tanishka Shetty & Prannay Khushalani

Authors

Premanand Ghadekar
View author publications
You can also search for this author in PubMed Google Scholar
Khushi Jhanwar
View author publications
You can also search for this author in PubMed Google Scholar
Ameya Karpe
View author publications
You can also search for this author in PubMed Google Scholar
Akash Sivanandan
View author publications
You can also search for this author in PubMed Google Scholar
Tanishka Shetty
View author publications
You can also search for this author in PubMed Google Scholar
Prannay Khushalani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khushi Jhanwar .

Editor information

Editors and Affiliations

School of Computing, Madanapalle Institute of Technology and Science, Angallu, Andhra Pradesh, India
Basabi Chakraborty
Kazi Nazrul University, Asansol, West Bengal, India
Arindam Biswas
A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Amlan Chakrabarti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghadekar, P., Jhanwar, K., Karpe, A., Sivanandan, A., Shetty, T., Khushalani, P. (2023). ASR for Indian Regional Languages Using Fine-Tuned Wav2Vec2 Model. In: Chakraborty, B., Biswas, A., Chakrabarti, A. (eds) Advances in Data Science and Computing Technologies. ADSC 2022. Lecture Notes in Electrical Engineering, vol 1056. Springer, Singapore. https://doi.org/10.1007/978-981-99-3656-4_14

Download citation

DOI: https://doi.org/10.1007/978-981-99-3656-4_14
Published: 30 September 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3655-7
Online ISBN: 978-981-99-3656-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

ASR for Indian Regional Languages Using Fine-Tuned Wav2Vec2 Model

Abstract

Access this chapter

Similar content being viewed by others

Speech recognition model design for Sundanese language using WAV2VEC 2.0

Improving Automatic Speech Recognition for Non-native English with Transfer Learning and Language Model Decoding

Deep Neural Network Acoustic Model Baseline for Character-Level Transcription of Naturally Spoken Czech Language

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

ASR for Indian Regional Languages Using Fine-Tuned Wav2Vec2 Model

Abstract

Access this chapter

Similar content being viewed by others

Speech recognition model design for Sundanese language using WAV2VEC 2.0

Improving Automatic Speech Recognition for Non-native English with Transfer Learning and Language Model Decoding

Deep Neural Network Acoustic Model Baseline for Character-Level Transcription of Naturally Spoken Czech Language

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation