Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model

Dua, Mohit; Akanksha

doi:10.1007/978-981-19-7753-4_4

Mohit Dua⁴⁰ &
Akanksha⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 977))

489 Accesses
4 Citations

Abstract

In the case of low resource language, there is still the requirement for developing more efficient Automatic Speech Recognition (ASR) systems. In the proposed work, the ASR system is developed for the Gujarati language publicly available dataset. The approach in this paper applies the combination of Mel-frequency Cepstral Coefficients (MFCC) with Constant Q Cepstral Coefficients (CQCC)-based integrated front-end feature extraction techniques. To implement the backend part of the system, hybrid acoustic model is applied. Two-dimensional Convolutional Neural Network (Conv2D) with Bi-directional Gated Recurrent Units-based (BiGRU) backend model is used as the model. To build the ASR system, Connectionist Temporal Classification (CTC) loss function, CTC and prefix-based greedy decoder are also used with the acoustic model. The proposed work shows that the joint MFCC and CQCC feature extraction techniques show the 10–19% improvement in Word Error Rate (WER) as compared to isolated delta-delta features with the available integrated model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dua M, Kadyan V, Banthia N, Bansal A, Agarwal T (2022) Spectral warping and data augmentation for low resource language ASR system under mismatched conditions. Appl Acoust 190
Google Scholar
Pittala RB, Tejopriya BR, Pala E (2022) Study of speech recognition using CNN. In: International conference on artificial intelligence and smart energy (ICAIS), vol 2, pp 150–155
Google Scholar
Neumann V, Kinoshita T, Drude K, Boeddeker L, Delcroix C, Nakatani M, Haeb-Umbach T (2020) End-to-end training of time domain audio separation and recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7004–7008
Google Scholar
Wang D, Wang X, Lv S (2019) An overview of end-to-end automatic speech recognition. Symmetry 11(8):1018
Article Google Scholar
Jain A, Singh VP, Rath SP (2019) A multi-accent acoustic model using mixture of experts for speech recognition. Interspeech 779–783
Google Scholar
Scharenborg O, Ciannella F, Palaskar S, Black A, Metze F, Ondel L, Hasegawa-Johnson M (2017) Building an ASR system for a low-research language through the adaptation of a high-resource language ASR system: preliminary results. In: International conference on natural language, signal and speech processing (ICNLSSP), pp 26–30
Google Scholar
Tailor JH, Shah DB (2016) Speech recognition system architecture for Gujarati language. Int J Comput Appl 138(12)
Google Scholar
Valaki S, Jethva H (2017) A hybrid HMM/ANN approach for automatic Gujarati speech recognition. In: International conference on innovations in information, embedded and communication systems (ICIIECS), pp 1–5
Google Scholar
Madhavaraj A, Ramakrishnan AG (2019) Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages. In: National conference on communications (NCC), pp 1–5
Google Scholar
Raval D, Pathak V, Patel M, Bhatt B (2020) End-to-End automatic speech recognition for Gujarati. In: International conference on natural language processing (ICON), pp 409–419
Google Scholar
Mittal A, Dua M (2022) Static–dynamic features and hybrid deep learning models based spoof detection system for ASV. Complex Intell Syst 8(2):1153–1166
Article Google Scholar
Chaudhari A, Shedge DK (2022) Integration of CQCC and MFCC based features for replay attack detection. In: International conference on emerging smart computing and informatics (ESCI), pp 1–5
Google Scholar
Han W, Chan CF, Choy CS, Pun KP (2006) An efficient MFCC extraction method in speech recognition. In: IEEE international symposium on circuits and systems (ISCAS), pp 4–10
Google Scholar
Raval D, Pathak V, Patel M, Bhatt B (2021) Improving deep learning based automatic speech recognition for Gujarati. Trans Asian Low-Resour Lang Inf Process 21(3):1–18
Google Scholar
Tailor JH, Rakholia R, Saini JR, Kotecha K (2022) Deep learning approach for spoken digit recognition in Gujarati language. Int J Adv Comput Sci Appl 13(4)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, National Institue of Technology, Kurukshetra, Haryana, India
Mohit Dua & Akanksha

Authors

Mohit Dua
View author publications
You can also search for this author in PubMed Google Scholar
Akanksha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohit Dua .

Editor information

Editors and Affiliations

Department of ECE, PPG Institute of Technology, Coimbatore, Tamil Nadu, India
V. Bindhu
Faculdade de Engenharia, Departamento de, Universidade do Porto, Porto, Portugal
João Manuel R. S. Tavares
San Jose State University, FREMONT, CA, USA
Chandrasekar Vuppalapati

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dua, M., Akanksha (2023). Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model. In: Bindhu, V., Tavares, J.M.R.S., Vuppalapati, C. (eds) Proceedings of Fourth International Conference on Communication, Computing and Electronics Systems . Lecture Notes in Electrical Engineering, vol 977. Springer, Singapore. https://doi.org/10.1007/978-981-19-7753-4_4

Download citation

DOI: https://doi.org/10.1007/978-981-19-7753-4_4
Published: 15 March 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7752-7
Online ISBN: 978-981-19-7753-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics