Skip to main content

Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model

  • Conference paper
  • First Online:
Proceedings of Fourth International Conference on Communication, Computing and Electronics Systems

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 977))

Abstract

In the case of low resource language, there is still the requirement for developing more efficient Automatic Speech Recognition (ASR) systems. In the proposed work, the ASR system is developed for the Gujarati language publicly available dataset. The approach in this paper applies the combination of Mel-frequency Cepstral Coefficients (MFCC) with Constant Q Cepstral Coefficients (CQCC)-based integrated front-end feature extraction techniques. To implement the backend part of the system, hybrid acoustic model is applied. Two-dimensional Convolutional Neural Network (Conv2D) with Bi-directional Gated Recurrent Units-based (BiGRU) backend model is used as the model. To build the ASR system, Connectionist Temporal Classification (CTC) loss function, CTC and prefix-based greedy decoder are also used with the acoustic model. The proposed work shows that the joint MFCC and CQCC feature extraction techniques show the 10–19% improvement in Word Error Rate (WER) as compared to isolated delta-delta features with the available integrated model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dua M, Kadyan V, Banthia N, Bansal A, Agarwal T (2022) Spectral warping and data augmentation for low resource language ASR system under mismatched conditions. Appl Acoust 190

    Google Scholar 

  2. Pittala RB, Tejopriya BR, Pala E (2022) Study of speech recognition using CNN. In: International conference on artificial intelligence and smart energy (ICAIS), vol 2, pp 150–155

    Google Scholar 

  3. Neumann V, Kinoshita T, Drude K, Boeddeker L, Delcroix C, Nakatani M, Haeb-Umbach T (2020) End-to-end training of time domain audio separation and recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 7004–7008

    Google Scholar 

  4. Wang D, Wang X, Lv S (2019) An overview of end-to-end automatic speech recognition. Symmetry 11(8):1018

    Article  Google Scholar 

  5. Jain A, Singh VP, Rath SP (2019) A multi-accent acoustic model using mixture of experts for speech recognition. Interspeech 779–783

    Google Scholar 

  6. Scharenborg O, Ciannella F, Palaskar S, Black A, Metze F, Ondel L, Hasegawa-Johnson M (2017) Building an ASR system for a low-research language through the adaptation of a high-resource language ASR system: preliminary results. In: International conference on natural language, signal and speech processing (ICNLSSP), pp 26–30

    Google Scholar 

  7. Tailor JH, Shah DB (2016) Speech recognition system architecture for Gujarati language. Int J Comput Appl 138(12)

    Google Scholar 

  8. Valaki S, Jethva H (2017) A hybrid HMM/ANN approach for automatic Gujarati speech recognition. In: International conference on innovations in information, embedded and communication systems (ICIIECS), pp 1–5

    Google Scholar 

  9. Madhavaraj A, Ramakrishnan AG (2019) Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages. In: National conference on communications (NCC), pp 1–5

    Google Scholar 

  10. Raval D, Pathak V, Patel M, Bhatt B (2020) End-to-End automatic speech recognition for Gujarati. In: International conference on natural language processing (ICON), pp 409–419

    Google Scholar 

  11. Mittal A, Dua M (2022) Static–dynamic features and hybrid deep learning models based spoof detection system for ASV. Complex Intell Syst 8(2):1153–1166

    Article  Google Scholar 

  12. Chaudhari A, Shedge DK (2022) Integration of CQCC and MFCC based features for replay attack detection. In: International conference on emerging smart computing and informatics (ESCI), pp 1–5

    Google Scholar 

  13. Han W, Chan CF, Choy CS, Pun KP (2006) An efficient MFCC extraction method in speech recognition. In: IEEE international symposium on circuits and systems (ISCAS), pp 4–10

    Google Scholar 

  14. Raval D, Pathak V, Patel M, Bhatt B (2021) Improving deep learning based automatic speech recognition for Gujarati. Trans Asian Low-Resour Lang Inf Process 21(3):1–18

    Google Scholar 

  15. Tailor JH, Rakholia R, Saini JR, Kotecha K (2022) Deep learning approach for spoken digit recognition in Gujarati language. Int J Adv Comput Sci Appl 13(4)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohit Dua .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dua, M., Akanksha (2023). Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model. In: Bindhu, V., Tavares, J.M.R.S., Vuppalapati, C. (eds) Proceedings of Fourth International Conference on Communication, Computing and Electronics Systems . Lecture Notes in Electrical Engineering, vol 977. Springer, Singapore. https://doi.org/10.1007/978-981-19-7753-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-7753-4_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-7752-7

  • Online ISBN: 978-981-19-7753-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics