Abstract
In the case of low resource language, there is still the requirement for developing more efficient Automatic Speech Recognition (ASR) systems. In the proposed work, the ASR system is developed for the Gujarati language publicly available dataset. The approach in this paper applies the combination of Mel-frequency Cepstral Coefficients (MFCC) with Constant Q Cepstral Coefficients (CQCC)-based integrated front-end feature extraction techniques. To implement the backend part of the system, hybrid acoustic model is applied. Two-dimensional Convolutional Neural Network (Conv2D) with Bi-directional Gated Recurrent Units-based (BiGRU) backend model is used as the model. To build the ASR system, Connectionist Temporal Classification (CTC) loss function, CTC and prefix-based greedy decoder are also used with the acoustic model. The proposed work shows that the joint MFCC and CQCC feature extraction techniques show the 10–19% improvement in Word Error Rate (WER) as compared to isolated delta-delta features with the available integrated model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dua M, Kadyan V, Banthia N, Bansal A, Agarwal T (2022) Spectral warping and data augmentation for low resource language ASR system under mismatched conditions. Appl Acoust 190
Pittala RB, Tejopriya BR, Pala E (2022) Study of speech recognition using CNN. In: International conference on artificial intelligence and smart energy (ICAIS), vol 2, pp 150–155
Neumann V, Kinoshita T, Drude K, Boeddeker L, Delcroix C, Nakatani M, Haeb-Umbach T (2020) End-to-end training of time domain audio separation and recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7004–7008
Wang D, Wang X, Lv S (2019) An overview of end-to-end automatic speech recognition. Symmetry 11(8):1018
Jain A, Singh VP, Rath SP (2019) A multi-accent acoustic model using mixture of experts for speech recognition. Interspeech 779–783
Scharenborg O, Ciannella F, Palaskar S, Black A, Metze F, Ondel L, Hasegawa-Johnson M (2017) Building an ASR system for a low-research language through the adaptation of a high-resource language ASR system: preliminary results. In: International conference on natural language, signal and speech processing (ICNLSSP), pp 26–30
Tailor JH, Shah DB (2016) Speech recognition system architecture for Gujarati language. Int J Comput Appl 138(12)
Valaki S, Jethva H (2017) A hybrid HMM/ANN approach for automatic Gujarati speech recognition. In: International conference on innovations in information, embedded and communication systems (ICIIECS), pp 1–5
Madhavaraj A, Ramakrishnan AG (2019) Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages. In: National conference on communications (NCC), pp 1–5
Raval D, Pathak V, Patel M, Bhatt B (2020) End-to-End automatic speech recognition for Gujarati. In: International conference on natural language processing (ICON), pp 409–419
Mittal A, Dua M (2022) Static–dynamic features and hybrid deep learning models based spoof detection system for ASV. Complex Intell Syst 8(2):1153–1166
Chaudhari A, Shedge DK (2022) Integration of CQCC and MFCC based features for replay attack detection. In: International conference on emerging smart computing and informatics (ESCI), pp 1–5
Han W, Chan CF, Choy CS, Pun KP (2006) An efficient MFCC extraction method in speech recognition. In: IEEE international symposium on circuits and systems (ISCAS), pp 4–10
Raval D, Pathak V, Patel M, Bhatt B (2021) Improving deep learning based automatic speech recognition for Gujarati. Trans Asian Low-Resour Lang Inf Process 21(3):1–18
Tailor JH, Rakholia R, Saini JR, Kotecha K (2022) Deep learning approach for spoken digit recognition in Gujarati language. Int J Adv Comput Sci Appl 13(4)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dua, M., Akanksha (2023). Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model. In: Bindhu, V., Tavares, J.M.R.S., Vuppalapati, C. (eds) Proceedings of Fourth International Conference on Communication, Computing and Electronics Systems . Lecture Notes in Electrical Engineering, vol 977. Springer, Singapore. https://doi.org/10.1007/978-981-19-7753-4_4
Download citation
DOI: https://doi.org/10.1007/978-981-19-7753-4_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7752-7
Online ISBN: 978-981-19-7753-4
eBook Packages: EngineeringEngineering (R0)