Skip to main content
Log in

Hindi speech recognition in noisy environment using hybrid technique

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Automatic speech recognition is generally analyzed for two types of word utterances; isolated and continuous-words speech. Continuous-words speech is almost natural way of speaking but is difficult to be recognized through machines (speech recognizers). It is also highly sensitive to environmental variations. There are various parameters which are directly affecting the performance of automatic speech recognition like size of datasets/corpus, type of data sets (isolated, spontaneous or continuous) and environment variations (noisy/clean). The performance of speech recognizers is generally good in clean environments for isolated words, but it becomes typical in noisy environments especially for continuous words/sentences and is still a challenge. In this paper, a hybrid feature extraction technique is proposed by joining core blocks of PLP (perceptual linear predictive) and Mel frequency cepstral coefficients (MFCC) that can be utilized to improve the performance of speech recognizers under such circumstances. Voice activity and detection (VAD)-based frame dropping formula has been used solely within the training part of ASR (automatic speech recognition) procedure obviating its need in actual implementations. The motivation to use this formula is for removal of pauses and distorted elements of speech improving the phonemes modeling further. The proposed method shows average improvement in performance by 12.88% for standard datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig.10

Similar content being viewed by others

References

  1. Kurzekar PK, Desmukh RR, Waghmare VB, Shrishrimal P (2014) Continuous speech recognition system: a review. Asian J Comput Sci Inform Technol (AJCSIT) 4:(6): 62–66

  2. Agarwal RK, Dave M (2008) Implementing a speech recognition interface for Indian Languages. In: Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages. pp. 105–112

  3. Keronen S, Remes U, Palomaki KJ, Virtanen T, Kurimo M (2010) Comparison of noise robust methods in large vocabulary speech recognition. In: 18th European Signal Processing Conference (EUSIPCO-2010), 1973–1977

  4. Li Q, Zheng J, Tsai A, Zhou Q (2002) Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans Speech Audio Process 10(3):146–157

    Article  Google Scholar 

  5. Cui X, Alwan A (2005) Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR. IEEE Trans Speech Audio Process 13(6):1161–1172. https://doi.org/10.1109/TSA.2005.853002

    Article  Google Scholar 

  6. Le Prell CG, Clavier OH (2017) Effects of noise on speech recognition: Challenges for communication by service members, www.elsevier.com/locate/heares. Hearing Res 349:76–89

  7. Wright SJ, Kanevsky D, Deng L, He X, Heigold G, Li H (2013) Optimization algorithms and applications for speech and language processing. IEEE Trans Audio Speech Lang Process 21(11):2231–2243

    Article  Google Scholar 

  8. Nasef A, Marjanovic-Jakovlijevic M, Njegus A (2017) Optimization of the speaker recognition in noisy environments using a stochastic gradient descent. Intern Sci Conf Inform Technol Data Relat Res Sinteza 2017:369–373

    Google Scholar 

  9. Healy EW, Yoho SE, Wang Y, Wang D (2013) An algorithm to improve speech recognition in noise for hearing-impaired listeners. J Acoust Soc Am 134(4):3029–3038. https://doi.org/10.1121/1.4820893

    Article  Google Scholar 

  10. Geiger JT, Weninger F, Gemmeke JF, Wollmer M, Schuller B, Rigoll G (2014) Memory-enhanced neural networks and NMF for robust ASR. IEEE/ACM Trans Audio Speech Lang process 22(6):1037–1046. https://doi.org/10.1109/TASLP.2014.2318514

    Article  Google Scholar 

  11. Sahu SK, Kumar P, Singh AP (2018) Modified K-NN algorithm for classification problems with improved accuracy. Intern J Inform Technol 10:65–70. https://doi.org/10.1007/s41870-017-0058-z

    Article  Google Scholar 

  12. Bouafif L, Ouni K (2012) A speech tool software for signal processing applications. In: 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). pp. 788–791

  13. Sumithra MG, Ramya MS, Thanuskodi K (2011) Speech recognition in noisy environment using different feature extraction techniques. Intern J Computat Intell Telecommun Syst 2(1):57–62

    Google Scholar 

  14. Rahman MM, Saha SK, Hossain MK, Islam MB (2012) Performance evaluation of CMN for Mel-LPC based speech recognition in different noisy environments. Intern J Comput Appl 58(10):6–10. https://doi.org/10.5120/9316-3548

    Article  Google Scholar 

  15. Pillai D, Siddavatam I (2019) A modified framework to detect keyloggers using machine learning algorithm. Int J Inf Technol 11:707–712. https://doi.org/10.1007/s41870-018-0237-6

    Article  Google Scholar 

  16. Eringis D, Tamulevicius G (2014) Improving speech recognition rate through analysis parameters. Electr Contr Commun Eng 5(1). https://doi.org/10.2478/ecce-2014-009

  17. Dave N (2013) Feature extraction methods LPC PLP and MFCC in speech recognition. Intern J Adv Res Eng Technol 1(6):1–5

    Google Scholar 

  18. Patil S, Anandhi RJ (2020) Diversity based self-adaptive clusters using PSO clustering for crime data. Int J Inf Technol 12:319–327. https://doi.org/10.1007/s41870-019-00311-z

    Article  Google Scholar 

  19. Dekens T, Verhelst W, Capman F, Beaugendre F (2010) Improved speech recognition in noisy environments by using a throat microphone for accurate voicing detection. In: 18th European Signal Processing Conference (EUSIPCO-2010), 1978–1982

  20. Sharma K, Sinha HP, Agarwal RK (2010) Comparative study of speech recognition system using various feature extraction techniques. Intern J Inform Technol Knowl Manage 3(2):695–698

    Google Scholar 

  21. Rahkar Farshi T, Orujpour M (2019) Multi-level image thresholding based on social spider algorithm for global optimization. Intern J Inform Technol 11:713–718. https://doi.org/10.1007/s41870-019-00328-4

    Article  Google Scholar 

  22. Qazi KA, Nawaz T, Mehmood Z, Rashid M, Hafiz AH (2018) A hybrid technique for speech segregation and classification using a sophisticated deep neural network. PLoS ONE 13:e0194151. https://doi.org/10.1371/journal.pone.0194151

    Article  Google Scholar 

  23. Joseph FJJ (2020) Effect of supervised learning methodologies in offline handwritten Thai character recognition. Int J Inf Technol 12:57–64. https://doi.org/10.1007/s41870-019-00366-y

    Article  Google Scholar 

  24. Nassif AB, Shanin I, Attili I, Azzeh M, Shaalan K (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7:19143–19165

    Article  Google Scholar 

  25. Gerkmann T, Hendriks RC (2011) Noise power estimation based on the probability of speech presence. In: 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 145–148

  26. Psutka J, Muller L, Psutka JV (2001) Comparison of MFCC and PLP Parameterizations in the speaker independent continuous speech recognition task, Eurospeech 2001, Scandinavia

  27. Xie L, Liu ZQ (2006) A comparative study of audio features for audio to visual cobversion in MPEG-4 compliant facial animation. In: Proc. of ICMLC, Dalian, 13–16 Aug-2006

  28. Leong ATK (2003) A music identification system based on audio content similarity. In: Thesis of Bachelor of Engineering, Division of Electrical Engineering, The School of Information Technology and Electrical Engineering, The University of Queensland, Queensland

  29. Murugappan M, Selvaraj J (2012) DWT and MFCC based human emotional speech classification using LDA. In: International Conference on Biomedical Engineering (ICoBE), Penang, pp. 203–206

  30. Prithvi P, Kumar TK (2016) Comparative analysis of MFCC, LFCC, RASTA-PLP. In: International Journal of Scientific Engineering and Research (IJSER) 4(5): 4–7

  31. Dua M, Agarwal RK, Biswas M (2018) Performance evaluation of hindi speech recognition using optimized filter banks. Eng Sci Technol Intern J 21(2018):389–398. https://doi.org/10.1016/j.jestch.2018.04.005

    Article  Google Scholar 

  32. Hermansky H (1990) Perceptual linear predictive (PLP) analysis for speech. J Acoust Soc Am 87(4):1738–1752. https://doi.org/10.1121/1.399423

    Article  Google Scholar 

  33. Hermansky H., Hanson B. and Wakita H (1985) Perceptually based linear predictive analysis of speech, acoustics, speech, and signal processing. In: IEEE International Conference on ICASSP 85, 10:509–512

  34. Hermansky H, Morgan N, Bayya A, Kohn P (1991) The challenge of inverse-E: the RASTA-PLP method. IEEE 2:800–804. https://doi.org/10.1109/ACSSC.1991.186557

    Article  Google Scholar 

  35. Kim Phil, MATLAB Deep Learning. https://doi.org/10.1007/978-1-4842-2845-6

Download references

Acknowledgements

A special note of thanks is due to ECE Department, NIT Kurukshetra, Haryana, India for providing the required infrastructure and research environment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashok Kumar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, A., Mittal, V. Hindi speech recognition in noisy environment using hybrid technique. Int. j. inf. tecnol. 13, 483–492 (2021). https://doi.org/10.1007/s41870-020-00586-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-020-00586-7

Keywords

Navigation