Skip to main content

Speech Recognition Employing MFCC and Dynamic Time Warping Algorithm

  • Conference paper
  • First Online:
Innovations in Information and Communication Technologies (IICT-2020)

Part of the book series: Advances in Science, Technology & Innovation ((ASTI))

Abstract

Speech has been an integral part of human life acting as one of the five primitive senses of the human body. As such any software or application based upon speech recognition has a high degree of acceptance and a wide range of applications in defense, security, health care, and home automation. Speech is a waffling signal with varying characteristics at a high rate. When examined over a very short scale of time, it can be considered as a stationary signal with very small variations. In this paper, authors have worked upon the detection of a single user using multiple isolated words as speech signals. For designing the system, feature extraction using Mel-frequency cepstral coefficients (MFCCs) and feature matching using dynamic time warping (DTW) are considered as the designing of the system because of its simplicity and efficiency. Short-time spectral analysis is adopted which is the main part of the MFCC algorithm used in feature extraction. To compare any two signals varying in speed or having phase difference between them, DTW is used. Since two spoken words can never be the same, the DTW algorithm is best suited to compare two words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Afrillia, Y., Mawengkang, H., Ramli, M., & Fhonna, R. P. (2017). Performance measurement of mel frequency ceptral coefficient (MFCC) method in learning system of Al-Qur’an based in nagham pattern recognition. Journal of Physics: Conference Series IOP Publishing., 930(1), 1–6.

    Google Scholar 

  • Anggraeni, D., Sanjaya, W. S. M., Solih, M. Y., & Munawwaroh, M. (2018). The implementation of speech recognition using mel-frequency cepstrum coefficients (MFCC) and support vector machine (SVM) method based on python to control robot arm. Annual Applied Science and Engineering Conference, 2, 1–9.

    Google Scholar 

  • Azami, H., Mohammadi, K., Bozorgtabar, B. (2012). An ımproved signal segmentation using moving average and savitzky-golay filter. Journal of Signal & Information Processing, 3, 39–44.

    Google Scholar 

  • Brown, P. F., Lee, C.H., Spohr, J. C. (1983). Bayesian adaptation inspeech recognition. IEEE International Cont on Acoustics, Speech, and Signal Processing, 8, 761–764.

    Google Scholar 

  • Das, B. P., & Parek, R. (2012). Recognition of isolated words using features based on LPC, MFCC, ZCR and STE with neural network classifiers. International Journal of Modern Enginnering Research, 2(3), 854–858.

    Google Scholar 

  • Dhingra, S., Nijhawan, G., Poonam, Pandit. (2013). Isolated speech recognıtıon usıng MFCC And DTW. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, 2(8), 4085–4092.

    Google Scholar 

  • Huang, X., & Lee, K. (1993). On speaker-independent, speaker-dependent and speaker-adpaptive speech recognition. IEEE Transaction on Speech and Audio Processing, 1(2), 150–157.

    Article  Google Scholar 

  • Mansour, A. H., Salh, G. Z. A., & Mohammed, K. A. (2015). Voice recognition using dynamic time warping and mel-frequency cepstral coefficients algorithms. International Journal of Computer Applications., 116(2), 34–41.

    Article  Google Scholar 

  • Mohan, B., Babu, R .: Speech recognition using MFCC and DTW. In ICAEE Conference paper. https://doi.org/10.1109/ICAEE.2014.6838564.

  • Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient and dynamic time warping techniques. Journal of Computing., 2(3), 138–143.

    Google Scholar 

  • Plouffe, G., & Cretu, A. M. (2015). Static and dynamic hand gesture recognition in depth data using dynamic time warping. IEEE Transactions on Instrumentation and Measurement, 65(2), 305–316.

    Article  Google Scholar 

  • Riyaz, S., Bhavani, B. L., & Kumar, S. V. P. (2019). Automatic speaker recognition system in Urdu using MFCC & HMM. International Journal of Recent Technology and Engineering (IJRTE), 7, 109–113.

    Google Scholar 

  • Shaikh, H., Mesquita, L., Das, S., & Araujo, S. (2017). Recognition of isolated spoken words and numeric using MFCC and DTW. International Journal Engineering Science and Computing., 7(4), 10539–10543.

    Google Scholar 

  • Singh, P. K., Kar, A. K., Singh, Y., Kolekar, M. H., Tanwar, S. (2019). Proceedings of ICRIC Recent Innovations in Computing, vol. 597. Springer Nature.

    Google Scholar 

  • Zhao, X., & Wang, D. (2013). Analyzing noise robustness of MFCC and GFCC features in speaker identification. IEEE International Conference on Acoustics, Speech and Signal Processing 7204–7208.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shruti Jain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sood, M., Jain, S. (2021). Speech Recognition Employing MFCC and Dynamic Time Warping Algorithm. In: Singh, P.K., Polkowski, Z., Tanwar, S., Pandey, S.K., Matei, G., Pirvu, D. (eds) Innovations in Information and Communication Technologies (IICT-2020). Advances in Science, Technology & Innovation. Springer, Cham. https://doi.org/10.1007/978-3-030-66218-9_27

Download citation

Publish with us

Policies and ethics