Abstract
Speech is the standard vocalized communication media. Speech is one of the comfortable way for humans to communicate with each other. Similarly, speech recognition system is eagerly necessary to communicate with computer through voice. Speech recognition in English language already helps us to operate English voice command-based applications. But in rural and semi-urban areas, due to lack of knowledge in English in India, it is necessary to implement automatic speech recognition in regional languages. Here, we have built a Gaussian Mixture Model (GMM)-based Bengali (also called Bangla) isolated spoken numerals recognition system where mel frequency cepstral coefficients denoted as MFCC is taken for feature extraction. The proposed system achieved 91.7% correct prediction for the Bangla numeral data set of 1000 audio samples for 10 classes which is satisfactory for previous Bangla spoken digit recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Karpagavalli, S., & Chandra, E. (2015). Phoneme and word based model for Tamil speech recognition using GMM-HMM. In 2015 International Conference on Advanced Computing and Communication Systems (pp. 1–5). IEEE.
Gupta, A., & Sarkar, K. (2018). Recognition of spoken bengali numerals using MLP, SVM, RF based models with PCA based feature summarization. International Arab Journal of Information Technology, 15(2), 263–269.
Muhammad, G., Alotaibi, Y. A., & Huda, M. N. (2009). Automatic speech recognition for Bangla digits. In 2009 12th International Conference on Computers and Information Technology (pp. 379–383). IEEE.
Gamit, M. R., & Dhameliya, K. (2015). Isolated words recognition using MFCC, LPC and neural network. International Journal of Research in Engineering and Technology, 4(6), 146–149.
Patil, U. G., Shirbahadurkar, S. D., & Paithane, A. N. (2016). Automatic speech recognition of isolated words in Hindi language using MFCC. In 2016 International Conference on Computing, Analytics and Security Trends (CAST) (pp. 433–438). IEEE.
Hammami, N., Bedda, M., & Farah, N. (2012). Spoken Arabic digits recognition using MFCC based on GMM. In 2012 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology (STUDENT) (pp. 160–163). IEEE.
Chauhan, V., Dwivedi, S., Karale, P., & Potdar, S. M. (2016). Speech to text converter using Gaussian Mixture Model (GMM). International Research Journal of Engineering and Technology (IRJET), 3(5), 160–164.
Ali, M. A., Hossain, M., & Bhuiyan, M. N. (2013). Automatic speech recognition technique for Bangla words. International Journal of Advanced Science and Technology, 50.
Padmanabhan, J., & Johnson Premkumar, M. J. (2015). Machine learning in automatic speech recognition: A survey. IETE Technical Review, 32(4), 240–251.
Permanasari, Y., Harahap, E. H., & Ali, E. P. (2019). Speech recognition using dynamic time warping (DTW). In Journal of Physics: Conference Series (Vol. 1366, No. 1, p. 012091). UK: IOP Publishing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Paul, B., Bera, S., Paul, R., Phadikar, S. (2021). Bengali Spoken Numerals Recognition by MFCC and GMM Technique. In: Mallick, P.K., Bhoi, A.K., Chae, GS., Kalita, K. (eds) Advances in Electronics, Communication and Computing. ETAEERE 2020. Lecture Notes in Electrical Engineering, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-15-8752-8_9
Download citation
DOI: https://doi.org/10.1007/978-981-15-8752-8_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8751-1
Online ISBN: 978-981-15-8752-8
eBook Packages: Computer ScienceComputer Science (R0)