Skip to main content

Hybrid Feature Extraction Technique for Tamil Automatic Speech Recognition System in Noisy Environment

  • Conference paper
  • First Online:
Recent Trends in Communication and Intelligent Systems

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Abstract

Automatic speech recognition (ASR) system is used to recognize the text transcript from the given speech signal. Such speech signal can contain either isolated words or large vocabulary continuous speech (LVCS). Isolated words can be recognized with high accuracy in clean environment, but recognizing continuous words involves various parameters like speech corpus, speaker, environment noise, etc., that affects the accuracy of automatic speech recognition system directly. In the proposed work, hybrid feature extraction technique combines the perceptual linear predictive (PLP) and mel frequency cepstral coefficient (MFCC) to improve the accuracy of ASR in noisy environment. Voice activity and detection (VAD)-based frame dropping is used for improving the phonemes modeling by removing the pauses and distorted elements from the given speech signal. The proposed hybrid model with VAD is implemented by using self-generated speech corpus and shows relatively 12% increase in recognition rate compared with the state-of-the-art methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. P.K. Kurzekar, R.R. Desmukh, V.B. Waghmare, P. Shrishrimal, Continuous speech recognition system: a review. Asian J. Comput. Sci. Inform. Technol. (AJCSIT) 4(6), 62–66 (2014)

    Google Scholar 

  2. S.B Davis, P. Mermelstein, Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. Readings in Speech Recognition (Elsevier, 1990), pp. 65–74

    Google Scholar 

  3. R.K. Agarwal, M. Dave, Implementing a speech recognition interface for Indian Languages. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages (2008), pp. 105–112

    Google Scholar 

  4. S. Keronen, U. Remes, K.J. Palomaki, T. Virtanen, M. Kurimo, Comparison of noise robust methods in large vocabulary speech recognition. In 18th European Signal Processing Conference (EUSIPCO-2010) (2010), pp. 1973–1977

    Google Scholar 

  5. Q. Li, J. Zheng, A. Tsai, Q. Zhou, Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. Speech. Audio Process. 10(3), 146–157 (2002)

    Article  Google Scholar 

  6. A. Nasef, M. Marjanovic-Jakovlijevic, A. Njegus, Optimization of the speaker recognition in noisy environments using a stochastic gradient descent. Intern. Sci. Conf. Inform. Technol. Data. Relat. Res. Sinteza 2017, 369–373 (2017)

    Google Scholar 

  7. N. Roman, J. Woodruff, Ideal binary masking in reverberation. In 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO) (2012), pp. 629–633

    Google Scholar 

  8. J.T. Geiger, F. Weninger, J.F. Gemmeke, M. Wollmer, B. Schuller, G. Rigoll, Memory-enhanced neural networks and NMF for robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 22(6), 1037–1046 (2014). https://doi.org/10.1109/TASLP.2014.2318514

    Article  Google Scholar 

  9. S.K. Sahu, P. Kumar, A.P. Singh, Modified K-NN algorithm for classification problems with improved accuracy. Intern. J. Inform. Technol. 10, 65–70 (2018). https://doi.org/10.1007/s41870-017-0058-z

    Article  Google Scholar 

  10. L. Bouafif, K. Ouni, A speech tool software for signal processing applications. In 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 788–791 (2012)

    Google Scholar 

  11. M.G. Sumithra, M.S. Ramya, K. Thanuskodi, Speech recognition in noisy environment using different feature extraction techniques. Intern. J. Computat. Intell. Telecommun. Syst. 2(1), 57–62 (2011)

    Google Scholar 

  12. M.M. Rahman, S.K. Saha, M.K. Hossain, M.B. Islam, Performance evaluation of CMN for Mel-LPC based speech recognition in different noisy environments. Intern. J. Comput. Appl. 58(10), 6–10 (2012). https://doi.org/10.5120/9316-3548

    Article  Google Scholar 

  13. N. Dave, Feature extraction methods LPC PLP and MFCC in speech recognition. Intern. J. Adv. Res. Eng. Technol. 1(6), 1–5 (2013)

    Google Scholar 

  14. T. Dekens, W. Verhelst, F. Capman, F. Beaugendre, Improved speech recognition in noisy environments by using a throat microphone for accurate voicing detection. In 18th European Signal Processing Conference (EUSIPCO-2010) (2010), pp. 1978–1982

    Google Scholar 

  15. K. Sharma, H.P. Sinha, R.K. Agarwal, Comparative study of speech recognition system using various feature extraction techniques. Intern. J. Inform. Technol. Knowl. Manage. 3(2), 695–698 (2010)

    Google Scholar 

  16. F.J.J. Joseph, Effect of supervised learning methodologies in offline handwritten Thai character recognition. Int. J. Inf. Technol. 12, 57–64 (2020). https://doi.org/10.1007/s41870-019-00366-y

    Article  Google Scholar 

  17. A.B. Nassif, I. Shanin, I. Attili, M. Azzeh, K. Shaalan, Speech recognition using deep neural networks: a systematic review. IEEE Access 7, 19143–19165 (2019)

    Article  Google Scholar 

  18. T. Gerkmann, R.C. Hendriks, Noise power estimation based on the probability of speech presence. In 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2011), pp. 145–148

    Google Scholar 

  19. J. Psutka, L. Muller, J.V. Psutka, Comparison of MFCC and PLP Parameterizations in the speaker Independent Continuous Speech Recognition Task, Eurospeech 2001, Scandinavia (2001)

    Google Scholar 

  20. L. Xie, Z.Q. Liu, A comparative study of audio features for audio to visual cob version in MPEG-4 compliant facial animation. In Proceedings of ICMLC, Dalian, 13–16 Aug 2006

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Girirajan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Girirajan, S., Pandian, A. (2022). Hybrid Feature Extraction Technique for Tamil Automatic Speech Recognition System in Noisy Environment. In: Pundir, A.K.S., Yadav, N., Sharma, H., Das, S. (eds) Recent Trends in Communication and Intelligent Systems. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-19-1324-2_2

Download citation

Publish with us

Policies and ethics