Skip to main content

Automatic Speech Recognition of Bengali Using Kaldi

  • Conference paper
  • First Online:
Proceedings of Second International Conference on Sustainable Expert Systems

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 351))

Abstract

Bengali is a prominent language of the Indian subcontinent. This paper presents a comparison between different Bengali speech recognition models built with the Kaldi and Pytorch toolkits. Deep learning has been employed to improve speech recognition performance, and we explore the performance of the techniques for Bengali datasets. Seven different deep neural techniques have been employed in this paper for comparison. Grapheme to phoneme (G2P) is an important module for Indian languages which helps to decode phones from words in Unicode format. We develop a G2P model for Bangla using RNN, and we have shown that it performs well for the purpose. This research also demonstrated that using Kaldi-based feature extraction with DNN-HMM acoustic models yielded the best WER of 4.16 when combined with the Li-GRU neural network. The aim is to demonstrate the performance of the Bengali language using the current state-of-the-art (Kaldi) method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Badhon SMSI, Rahaman MdH, Rupon FR, Abujar S (2020) State of art research in Bengali speech recognition. In: 2020 11th International conference on computing, communication and networking technologies (ICCCNT), Kharagpur, India, July 2020, pp 1–6. https://doi.org/10.1109/ICCCNT49239.2020.9225650

  2. Ravanelli M, Parcollet T, Bengio Y (2019) The PyTorch-Kaldi speech recognition toolkit. In: ICASSP 2019—2019 IEEE International conference on acoustics, speech and signal processing (ICASSP), Brighton, United Kingdom, May 2019, pp 6465–6469. https://doi.org/10.1109/ICASSP.2019.8683713

  3. Amin MdAA, Islam MdT, Kibria S, Rahman MS (2019) Continuous Bengali speech recognition based on deep neural network. In: 2019 International conference on electrical, computer and communication engineering (ECCE), Cox’s Bazar, Bangladesh, Feb 2019, pp 1–6. https://doi.org/10.1109/ECACE.2019.8679341

  4. Rahman Saurav J, Amin S, Kibria S, Shahidur Rahman M (2018) Bangla speech recognition for voice search. In: 2018 International conference on Bangla speech and language processing (ICBSLP), Sylhet, Sept 2018, pp 1–4. https://doi.org/10.1109/ICBSLP.2018.8554944

  5. Hosain Sumit S, Al Muntasir T, Arefin Zaman MM, Nath Nandi R, Sourov T (2018) Noise robust end-to-end speech recognition for Bangla language. In: 2018 International conference on Bangla speech and language processing (ICBSLP), Sylhet, Sept 2018, pp 1–5. https://doi.org/10.1109/ICBSLP.2018.8554871

  6. IARPA Babel Bengali Language Pack. https://catalog.ldc.upenn.edu/LDC2016S08. Accessed Feb 2018

  7. Alam F, Habib SM, Sultana DA, Khan M (2010) Development of annotated Bangla speech corpora. In: Spoken languages technologies for under-resourced languages

    Google Scholar 

  8. Ravanelli M, Brakel P, Omologo M, Bengio Y (2018) Light gated recurrent units for speech recognition. IEEE Trans Emerg Top Comput Intell 2(2):92–102. https://doi.org/10.1109/TETCI.2017.2762739

    Article  Google Scholar 

  9. Guglani J, Mishra AN (2018) Continuous Punjabi speech recognition model based on Kaldi ASR toolkit. Int J Speech Technol 21(2):211–216. https://doi.org/10.1007/s10772-018-9497-6

    Article  Google Scholar 

  10. Ahmed Sumon S, Chowdhury J, Debnath S, Mohammed N, Momen S (2018) Bangla short speech commands recognition using convolutional neural networks. In: 2018 International conference on Bangla speech and language processing (ICBSLP), Sylhet, Sept 2018, pp 1–6. https://doi.org/10.1109/ICBSLP.2018.8554395

  11. Syfullah SM, Zakaria ZB, Uddin MdP, Rabbi MdF, Afjal MI, Nitu AM (2018) Efficient vector code-book generation using K-means and Linde-Buzo-Gray (LBG) algorithm for Bengali voice recognition. In: 2018 International conference on advancement in electrical and electronic engineering (ICAEEE), Gazipur, Bangladesh, Nov 2018, pp 1–4. https://doi.org/10.1109/ICAEEE.2018.8642994

  12. Upadhyaya P, Farooq O, Abidi MR, Varshney YV (2017) Continuous Hindi speech recognition model based on Kaldi ASR toolkit. In: 2017 International conference on wireless communications, signal processing and networking (WiSPNET), Chennai, Mar 2017, pp 786–789. https://doi.org/10.1109/WiSPNET.2017.8299868

  13. Das B, Mandal S, Mitra P (2011) Bengali speech corpus for continuous automatic speech recognition system. In: 2011 International conference on speech database and assessments (oriental COCOSDA), Hsinchu City, Taiwan, Oct 2011, pp 51–55. https://doi.org/10.1109/ICSDA.2011.6085979

  14. Mandal S, Das B, Mitra P, Basu A (2011) Developing Bengali speech corpus for phone recognizer using optimum text selection technique. In: 2011 International conference on Asian language processing, Penang, Malaysia, Nov 2011, pp 268–271. https://doi.org/10.1109/IALP.2011.16

  15. Povey D, Arnab G, Gilles B, Lukas B, Ondiez G, Nagendra G, Mirko H, Petr M, Yanmin Q, Petr S, Jan S, Georg S, Karel V (2011) The Kaldi speech recognition toolkit. In: 2011 IEEE Workshop on automatic speech recognition and understanding, Hilton Waikoloa Village, Big Island, Hawaii, US

    Google Scholar 

  16. Basu J, Basu T, Mitra M, Mandal SKD (2009) Grapheme to phoneme (G2P) conversion for Bangla. In: 2009 Oriental COCOSDA international conference on speech database and assessments, Urumqi, China, Aug 2009, pp 66–71. https://doi.org/10.1109/ICSDA.2009.5278373

  17. https://github.com/cmusphinx/g2p-seq2seq.git

  18. https://github.com/google/language-resources

  19. Kaiser L (2017) Accelerating deep learning research with the Tensor2Tensor library. In: Google research blog

    Google Scholar 

  20. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv:1706.03762

Download references

Acknowledgements

We would like to acknowledge Vikram Lakkavalli, Research & Development Head, Kaizen Secure Voiz Pvt. Ltd. for suggesting the problem and taking part in technical discussions without which this work would not have been possible. This work has been carried out at Kaizen Secure Voiz Pvt. Ltd. as a part of my thesis requirement.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guchhait, S., Hans, A.S.A., Augustine, J. (2022). Automatic Speech Recognition of Bengali Using Kaldi. In: Shakya, S., Du, KL., Haoxiang, W. (eds) Proceedings of Second International Conference on Sustainable Expert Systems . Lecture Notes in Networks and Systems, vol 351. Springer, Singapore. https://doi.org/10.1007/978-981-16-7657-4_14

Download citation

Publish with us

Policies and ethics