Speech Recognition Employing MFCC and Dynamic Time Warping Algorithm

Sood, Meenakshi; Jain, Shruti

doi:10.1007/978-3-030-66218-9_27

Meenakshi Sood²⁶ &
Shruti Jain²⁷

Part of the book series: Advances in Science, Technology & Innovation ((ASTI))

914 Accesses
5 Citations

Abstract

Speech has been an integral part of human life acting as one of the five primitive senses of the human body. As such any software or application based upon speech recognition has a high degree of acceptance and a wide range of applications in defense, security, health care, and home automation. Speech is a waffling signal with varying characteristics at a high rate. When examined over a very short scale of time, it can be considered as a stationary signal with very small variations. In this paper, authors have worked upon the detection of a single user using multiple isolated words as speech signals. For designing the system, feature extraction using Mel-frequency cepstral coefficients (MFCCs) and feature matching using dynamic time warping (DTW) are considered as the designing of the system because of its simplicity and efficiency. Short-time spectral analysis is adopted which is the main part of the MFCC algorithm used in feature extraction. To compare any two signals varying in speed or having phase difference between them, DTW is used. Since two spoken words can never be the same, the DTW algorithm is best suited to compare two words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Afrillia, Y., Mawengkang, H., Ramli, M., & Fhonna, R. P. (2017). Performance measurement of mel frequency ceptral coefficient (MFCC) method in learning system of Al-Qur’an based in nagham pattern recognition. Journal of Physics: Conference Series IOP Publishing., 930(1), 1–6.
Google Scholar
Anggraeni, D., Sanjaya, W. S. M., Solih, M. Y., & Munawwaroh, M. (2018). The implementation of speech recognition using mel-frequency cepstrum coefficients (MFCC) and support vector machine (SVM) method based on python to control robot arm. Annual Applied Science and Engineering Conference, 2, 1–9.
Google Scholar
Azami, H., Mohammadi, K., Bozorgtabar, B. (2012). An ımproved signal segmentation using moving average and savitzky-golay filter. Journal of Signal & Information Processing, 3, 39–44.
Google Scholar
Brown, P. F., Lee, C.H., Spohr, J. C. (1983). Bayesian adaptation inspeech recognition. IEEE International Cont on Acoustics, Speech, and Signal Processing, 8, 761–764.
Google Scholar
Das, B. P., & Parek, R. (2012). Recognition of isolated words using features based on LPC, MFCC, ZCR and STE with neural network classifiers. International Journal of Modern Enginnering Research, 2(3), 854–858.
Google Scholar
Dhingra, S., Nijhawan, G., Poonam, Pandit. (2013). Isolated speech recognıtıon usıng MFCC And DTW. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2(8), 4085–4092.
Google Scholar
Huang, X., & Lee, K. (1993). On speaker-independent, speaker-dependent and speaker-adpaptive speech recognition. IEEE Transaction on Speech and Audio Processing, 1(2), 150–157.
Article Google Scholar
Mansour, A. H., Salh, G. Z. A., & Mohammed, K. A. (2015). Voice recognition using dynamic time warping and mel-frequency cepstral coefficients algorithms. International Journal of Computer Applications., 116(2), 34–41.
Article Google Scholar
Mohan, B., Babu, R .: Speech recognition using MFCC and DTW. In ICAEE Conference paper. https://doi.org/10.1109/ICAEE.2014.6838564.
Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient and dynamic time warping techniques. Journal of Computing., 2(3), 138–143.
Google Scholar
Plouffe, G., & Cretu, A. M. (2015). Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Transactions on Instrumentation and Measurement, 65(2), 305–316.
Article Google Scholar
Riyaz, S., Bhavani, B. L., & Kumar, S. V. P. (2019). Automatic speaker recognition system in Urdu using MFCC & HMM. International Journal of Recent Technology and Engineering (IJRTE), 7, 109–113.
Google Scholar
Shaikh, H., Mesquita, L., Das, S., & Araujo, S. (2017). Recognition of isolated spoken words and numeric using MFCC and DTW. International Journal Engineering Science and Computing., 7(4), 10539–10543.
Google Scholar
Singh, P. K., Kar, A. K., Singh, Y., Kolekar, M. H., Tanwar, S. (2019). Proceedings of ICRIC Recent Innovations in Computing, vol. 597. Springer Nature.
Google Scholar
Zhao, X., & Wang, D. (2013). Analyzing noise robustness of MFCC and GFCC features in speaker identification. IEEE International Conference on Acoustics, Speech and Signal Processing 7204–7208.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CDC, NITTTR, 160019, Chandigarh, India
Meenakshi Sood
Department of ECE, JUIT, Solan, 173234, Himachal Pradesh, India
Shruti Jain

Authors

Meenakshi Sood
View author publications
You can also search for this author in PubMed Google Scholar
Shruti Jain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shruti Jain .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, India
Pradeep Kumar Singh
Wroclaw University of Economics, Jan Wyzykowski University in Polkowice, Polkowice, Poland
Zdzislaw Polkowski
Nirma University, Ahmedabad, Gujarat, India
Sudeep Tanwar
ITS Mohan Nagar, Ghaziabad, India
Sunil Kumar Pandey
Faculty of Economic Sciences, University of Craiova, Craiova, Romania
Gheorghe Matei
University of Pitesti, Pitesti, Romania
Daniela Pirvu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sood, M., Jain, S. (2021). Speech Recognition Employing MFCC and Dynamic Time Warping Algorithm. In: Singh, P.K., Polkowski, Z., Tanwar, S., Pandey, S.K., Matei, G., Pirvu, D. (eds) Innovations in Information and Communication Technologies (IICT-2020). Advances in Science, Technology & Innovation. Springer, Cham. https://doi.org/10.1007/978-3-030-66218-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-66218-9_27
Published: 16 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66217-2
Online ISBN: 978-3-030-66218-9
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics