Skip to main content

Bengali Spoken Numerals Recognition by MFCC and GMM Technique

  • Conference paper
  • First Online:
Advances in Electronics, Communication and Computing (ETAEERE 2020)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 709))

Abstract

Speech is the standard vocalized communication media. Speech is one of the comfortable way for humans to communicate with each other. Similarly, speech recognition system is eagerly necessary to communicate with computer through voice. Speech recognition in English language already helps us to operate English voice command-based applications. But in rural and semi-urban areas, due to lack of knowledge in English in India, it is necessary to implement automatic speech recognition in regional languages. Here, we have built a Gaussian Mixture Model (GMM)-based Bengali (also called Bangla) isolated spoken numerals recognition system where mel frequency cepstral coefficients denoted as MFCC is taken for feature extraction. The proposed system achieved 91.7% correct prediction for the Bangla numeral data set of 1000 audio samples for 10 classes which is satisfactory for previous Bangla spoken digit recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Karpagavalli, S., & Chandra, E. (2015). Phoneme and word based model for Tamil speech recognition using GMM-HMM. In 2015 International Conference on Advanced Computing and Communication Systems (pp. 1–5). IEEE.

    Google Scholar 

  2. Gupta, A., & Sarkar, K. (2018). Recognition of spoken bengali numerals using MLP, SVM, RF based models with PCA based feature summarization. International Arab Journal of Information Technology, 15(2), 263–269.

    Google Scholar 

  3. Muhammad, G., Alotaibi, Y. A., & Huda, M. N. (2009). Automatic speech recognition for Bangla digits. In 2009 12th International Conference on Computers and Information Technology (pp. 379–383). IEEE.

    Google Scholar 

  4. Gamit, M. R., & Dhameliya, K. (2015). Isolated words recognition using MFCC, LPC and neural network. International Journal of Research in Engineering and Technology, 4(6), 146–149.

    Article  Google Scholar 

  5. Patil, U. G., Shirbahadurkar, S. D., & Paithane, A. N. (2016). Automatic speech recognition of isolated words in Hindi language using MFCC. In 2016 International Conference on Computing, Analytics and Security Trends (CAST) (pp. 433–438). IEEE.

    Google Scholar 

  6. Hammami, N., Bedda, M., & Farah, N. (2012). Spoken Arabic digits recognition using MFCC based on GMM. In 2012 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology (STUDENT) (pp. 160–163). IEEE.

    Google Scholar 

  7. Chauhan, V., Dwivedi, S., Karale, P., & Potdar, S. M. (2016). Speech to text converter using Gaussian Mixture Model (GMM). International Research Journal of Engineering and Technology (IRJET), 3(5), 160–164.

    Google Scholar 

  8. Ali, M. A., Hossain, M., & Bhuiyan, M. N. (2013). Automatic speech recognition technique for Bangla words. International Journal of Advanced Science and Technology, 50.

    Google Scholar 

  9. Padmanabhan, J., & Johnson Premkumar, M. J. (2015). Machine learning in automatic speech recognition: A survey. IETE Technical Review, 32(4), 240–251.

    Article  Google Scholar 

  10. Permanasari, Y., Harahap, E. H., & Ali, E. P. (2019). Speech recognition using dynamic time warping (DTW). In Journal of Physics: Conference Series (Vol. 1366, No. 1, p. 012091). UK: IOP Publishing.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bachchu Paul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Paul, B., Bera, S., Paul, R., Phadikar, S. (2021). Bengali Spoken Numerals Recognition by MFCC and GMM Technique. In: Mallick, P.K., Bhoi, A.K., Chae, GS., Kalita, K. (eds) Advances in Electronics, Communication and Computing. ETAEERE 2020. Lecture Notes in Electrical Engineering, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-15-8752-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-8752-8_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-8751-1

  • Online ISBN: 978-981-15-8752-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics