Abstract
Speech has been an integral part of human life acting as one of the five primitive senses of the human body. As such any software or application based upon speech recognition has a high degree of acceptance and a wide range of applications in defense, security, health care, and home automation. Speech is a waffling signal with varying characteristics at a high rate. When examined over a very short scale of time, it can be considered as a stationary signal with very small variations. In this paper, authors have worked upon the detection of a single user using multiple isolated words as speech signals. For designing the system, feature extraction using Mel-frequency cepstral coefficients (MFCCs) and feature matching using dynamic time warping (DTW) are considered as the designing of the system because of its simplicity and efficiency. Short-time spectral analysis is adopted which is the main part of the MFCC algorithm used in feature extraction. To compare any two signals varying in speed or having phase difference between them, DTW is used. Since two spoken words can never be the same, the DTW algorithm is best suited to compare two words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afrillia, Y., Mawengkang, H., Ramli, M., & Fhonna, R. P. (2017). Performance measurement of mel frequency ceptral coefficient (MFCC) method in learning system of Al-Qur’an based in nagham pattern recognition. Journal of Physics: Conference Series IOP Publishing., 930(1), 1–6.
Anggraeni, D., Sanjaya, W. S. M., Solih, M. Y., & Munawwaroh, M. (2018). The implementation of speech recognition using mel-frequency cepstrum coefficients (MFCC) and support vector machine (SVM) method based on python to control robot arm. Annual Applied Science and Engineering Conference, 2, 1–9.
Azami, H., Mohammadi, K., Bozorgtabar, B. (2012). An ımproved signal segmentation using moving average and savitzky-golay filter. Journal of Signal & Information Processing, 3, 39–44.
Brown, P. F., Lee, C.H., Spohr, J. C. (1983). Bayesian adaptation inspeech recognition. IEEE International Cont on Acoustics, Speech, and Signal Processing, 8, 761–764.
Das, B. P., & Parek, R. (2012). Recognition of isolated words using features based on LPC, MFCC, ZCR and STE with neural network classifiers. International Journal of Modern Enginnering Research, 2(3), 854–858.
Dhingra, S., Nijhawan, G., Poonam, Pandit. (2013). Isolated speech recognıtıon usıng MFCC And DTW. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2(8), 4085–4092.
Huang, X., & Lee, K. (1993). On speaker-independent, speaker-dependent and speaker-adpaptive speech recognition. IEEE Transaction on Speech and Audio Processing, 1(2), 150–157.
Mansour, A. H., Salh, G. Z. A., & Mohammed, K. A. (2015). Voice recognition using dynamic time warping and mel-frequency cepstral coefficients algorithms. International Journal of Computer Applications., 116(2), 34–41.
Mohan, B., Babu, R .: Speech recognition using MFCC and DTW. In ICAEE Conference paper. https://doi.org/10.1109/ICAEE.2014.6838564.
Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient and dynamic time warping techniques. Journal of Computing., 2(3), 138–143.
Plouffe, G., & Cretu, A. M. (2015). Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Transactions on Instrumentation and Measurement, 65(2), 305–316.
Riyaz, S., Bhavani, B. L., & Kumar, S. V. P. (2019). Automatic speaker recognition system in Urdu using MFCC & HMM. International Journal of Recent Technology and Engineering (IJRTE), 7, 109–113.
Shaikh, H., Mesquita, L., Das, S., & Araujo, S. (2017). Recognition of isolated spoken words and numeric using MFCC and DTW. International Journal Engineering Science and Computing., 7(4), 10539–10543.
Singh, P. K., Kar, A. K., Singh, Y., Kolekar, M. H., Tanwar, S. (2019). Proceedings of ICRIC Recent Innovations in Computing, vol. 597. Springer Nature.
Zhao, X., & Wang, D. (2013). Analyzing noise robustness of MFCC and GFCC features in speaker identification. IEEE International Conference on Acoustics, Speech and Signal Processing 7204–7208.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sood, M., Jain, S. (2021). Speech Recognition Employing MFCC and Dynamic Time Warping Algorithm. In: Singh, P.K., Polkowski, Z., Tanwar, S., Pandey, S.K., Matei, G., Pirvu, D. (eds) Innovations in Information and Communication Technologies (IICT-2020). Advances in Science, Technology & Innovation. Springer, Cham. https://doi.org/10.1007/978-3-030-66218-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-66218-9_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66217-2
Online ISBN: 978-3-030-66218-9
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)