Hybrid Feature Extraction Technique for Tamil Automatic Speech Recognition System in Noisy Environment

Girirajan, S.; Pandian, A.

doi:10.1007/978-981-19-1324-2_2

S. Girirajan⁸ &
A. Pandian⁸

Part of the book series: Algorithms for Intelligent Systems ((AIS))

179 Accesses
1 Citations

Abstract

Automatic speech recognition (ASR) system is used to recognize the text transcript from the given speech signal. Such speech signal can contain either isolated words or large vocabulary continuous speech (LVCS). Isolated words can be recognized with high accuracy in clean environment, but recognizing continuous words involves various parameters like speech corpus, speaker, environment noise, etc., that affects the accuracy of automatic speech recognition system directly. In the proposed work, hybrid feature extraction technique combines the perceptual linear predictive (PLP) and mel frequency cepstral coefficient (MFCC) to improve the accuracy of ASR in noisy environment. Voice activity and detection (VAD)-based frame dropping is used for improving the phonemes modeling by removing the pauses and distorted elements from the given speech signal. The proposed hybrid model with VAD is implemented by using self-generated speech corpus and shows relatively 12% increase in recognition rate compared with the state-of-the-art methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

P.K. Kurzekar, R.R. Desmukh, V.B. Waghmare, P. Shrishrimal, Continuous speech recognition system: a review. Asian J. Comput. Sci. Inform. Technol. (AJCSIT) 4(6), 62–66 (2014)
Google Scholar
S.B Davis, P. Mermelstein, Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. Readings in Speech Recognition (Elsevier, 1990), pp. 65–74
Google Scholar
R.K. Agarwal, M. Dave, Implementing a speech recognition interface for Indian Languages. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages (2008), pp. 105–112
Google Scholar
S. Keronen, U. Remes, K.J. Palomaki, T. Virtanen, M. Kurimo, Comparison of noise robust methods in large vocabulary speech recognition. In 18th European Signal Processing Conference (EUSIPCO-2010) (2010), pp. 1973–1977
Google Scholar
Q. Li, J. Zheng, A. Tsai, Q. Zhou, Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. Speech. Audio Process. 10(3), 146–157 (2002)
Article Google Scholar
A. Nasef, M. Marjanovic-Jakovlijevic, A. Njegus, Optimization of the speaker recognition in noisy environments using a stochastic gradient descent. Intern. Sci. Conf. Inform. Technol. Data. Relat. Res. Sinteza 2017, 369–373 (2017)
Google Scholar
N. Roman, J. Woodruff, Ideal binary masking in reverberation. In 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO) (2012), pp. 629–633
Google Scholar
J.T. Geiger, F. Weninger, J.F. Gemmeke, M. Wollmer, B. Schuller, G. Rigoll, Memory-enhanced neural networks and NMF for robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 22(6), 1037–1046 (2014). https://doi.org/10.1109/TASLP.2014.2318514
Article Google Scholar
S.K. Sahu, P. Kumar, A.P. Singh, Modified K-NN algorithm for classification problems with improved accuracy. Intern. J. Inform. Technol. 10, 65–70 (2018). https://doi.org/10.1007/s41870-017-0058-z
Article Google Scholar
L. Bouafif, K. Ouni, A speech tool software for signal processing applications. In 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 788–791 (2012)
Google Scholar
M.G. Sumithra, M.S. Ramya, K. Thanuskodi, Speech recognition in noisy environment using different feature extraction techniques. Intern. J. Computat. Intell. Telecommun. Syst. 2(1), 57–62 (2011)
Google Scholar
M.M. Rahman, S.K. Saha, M.K. Hossain, M.B. Islam, Performance evaluation of CMN for Mel-LPC based speech recognition in different noisy environments. Intern. J. Comput. Appl. 58(10), 6–10 (2012). https://doi.org/10.5120/9316-3548
Article Google Scholar
N. Dave, Feature extraction methods LPC PLP and MFCC in speech recognition. Intern. J. Adv. Res. Eng. Technol. 1(6), 1–5 (2013)
Google Scholar
T. Dekens, W. Verhelst, F. Capman, F. Beaugendre, Improved speech recognition in noisy environments by using a throat microphone for accurate voicing detection. In 18th European Signal Processing Conference (EUSIPCO-2010) (2010), pp. 1978–1982
Google Scholar
K. Sharma, H.P. Sinha, R.K. Agarwal, Comparative study of speech recognition system using various feature extraction techniques. Intern. J. Inform. Technol. Knowl. Manage. 3(2), 695–698 (2010)
Google Scholar
F.J.J. Joseph, Effect of supervised learning methodologies in offline handwritten Thai character recognition. Int. J. Inf. Technol. 12, 57–64 (2020). https://doi.org/10.1007/s41870-019-00366-y
Article Google Scholar
A.B. Nassif, I. Shanin, I. Attili, M. Azzeh, K. Shaalan, Speech recognition using deep neural networks: a systematic review. IEEE Access 7, 19143–19165 (2019)
Article Google Scholar
T. Gerkmann, R.C. Hendriks, Noise power estimation based on the probability of speech presence. In 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2011), pp. 145–148
Google Scholar
J. Psutka, L. Muller, J.V. Psutka, Comparison of MFCC and PLP Parameterizations in the speaker Independent Continuous Speech Recognition Task, Eurospeech 2001, Scandinavia (2001)
Google Scholar
L. Xie, Z.Q. Liu, A comparative study of audio features for audio to visual cob version in MPEG-4 compliant facial animation. In Proceedings of ICMLC, Dalian, 13–16 Aug 2006
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
S. Girirajan & A. Pandian

Authors

S. Girirajan
View author publications
You can also search for this author in PubMed Google Scholar
A. Pandian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Girirajan .

Editor information

Editors and Affiliations

Arya College of Engineering & IT, Jaipur, Rajasthan, India
Aditya Kumar Singh Pundir
Department of Mathematics, National Institute of Technology, Hamirpur, Himachal Pradesh, India
Neha Yadav
Department of Computer Science and Engineering, Rajasthan Technical University, Kota, Rajasthan, India
Harish Sharma
Electronics and Communication Sciences Unit, Indian Statistical Institute, Kolkata, West Bengal, India
Swagatam Das

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Girirajan, S., Pandian, A. (2022). Hybrid Feature Extraction Technique for Tamil Automatic Speech Recognition System in Noisy Environment. In: Pundir, A.K.S., Yadav, N., Sharma, H., Das, S. (eds) Recent Trends in Communication and Intelligent Systems. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-19-1324-2_2

Download citation

DOI: https://doi.org/10.1007/978-981-19-1324-2_2
Published: 25 May 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1323-5
Online ISBN: 978-981-19-1324-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics