Abstract
Automatic speech recognition (ASR) system is used to recognize the text transcript from the given speech signal. Such speech signal can contain either isolated words or large vocabulary continuous speech (LVCS). Isolated words can be recognized with high accuracy in clean environment, but recognizing continuous words involves various parameters like speech corpus, speaker, environment noise, etc., that affects the accuracy of automatic speech recognition system directly. In the proposed work, hybrid feature extraction technique combines the perceptual linear predictive (PLP) and mel frequency cepstral coefficient (MFCC) to improve the accuracy of ASR in noisy environment. Voice activity and detection (VAD)-based frame dropping is used for improving the phonemes modeling by removing the pauses and distorted elements from the given speech signal. The proposed hybrid model with VAD is implemented by using self-generated speech corpus and shows relatively 12% increase in recognition rate compared with the state-of-the-art methodology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
P.K. Kurzekar, R.R. Desmukh, V.B. Waghmare, P. Shrishrimal, Continuous speech recognition system: a review. Asian J. Comput. Sci. Inform. Technol. (AJCSIT) 4(6), 62–66 (2014)
S.B Davis, P. Mermelstein, Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. Readings in Speech Recognition (Elsevier, 1990), pp. 65–74
R.K. Agarwal, M. Dave, Implementing a speech recognition interface for Indian Languages. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages (2008), pp. 105–112
S. Keronen, U. Remes, K.J. Palomaki, T. Virtanen, M. Kurimo, Comparison of noise robust methods in large vocabulary speech recognition. In 18th European Signal Processing Conference (EUSIPCO-2010) (2010), pp. 1973–1977
Q. Li, J. Zheng, A. Tsai, Q. Zhou, Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. Speech. Audio Process. 10(3), 146–157 (2002)
A. Nasef, M. Marjanovic-Jakovlijevic, A. Njegus, Optimization of the speaker recognition in noisy environments using a stochastic gradient descent. Intern. Sci. Conf. Inform. Technol. Data. Relat. Res. Sinteza 2017, 369–373 (2017)
N. Roman, J. Woodruff, Ideal binary masking in reverberation. In 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO) (2012), pp. 629–633
J.T. Geiger, F. Weninger, J.F. Gemmeke, M. Wollmer, B. Schuller, G. Rigoll, Memory-enhanced neural networks and NMF for robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 22(6), 1037–1046 (2014). https://doi.org/10.1109/TASLP.2014.2318514
S.K. Sahu, P. Kumar, A.P. Singh, Modified K-NN algorithm for classification problems with improved accuracy. Intern. J. Inform. Technol. 10, 65–70 (2018). https://doi.org/10.1007/s41870-017-0058-z
L. Bouafif, K. Ouni, A speech tool software for signal processing applications. In 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 788–791 (2012)
M.G. Sumithra, M.S. Ramya, K. Thanuskodi, Speech recognition in noisy environment using different feature extraction techniques. Intern. J. Computat. Intell. Telecommun. Syst. 2(1), 57–62 (2011)
M.M. Rahman, S.K. Saha, M.K. Hossain, M.B. Islam, Performance evaluation of CMN for Mel-LPC based speech recognition in different noisy environments. Intern. J. Comput. Appl. 58(10), 6–10 (2012). https://doi.org/10.5120/9316-3548
N. Dave, Feature extraction methods LPC PLP and MFCC in speech recognition. Intern. J. Adv. Res. Eng. Technol. 1(6), 1–5 (2013)
T. Dekens, W. Verhelst, F. Capman, F. Beaugendre, Improved speech recognition in noisy environments by using a throat microphone for accurate voicing detection. In 18th European Signal Processing Conference (EUSIPCO-2010) (2010), pp. 1978–1982
K. Sharma, H.P. Sinha, R.K. Agarwal, Comparative study of speech recognition system using various feature extraction techniques. Intern. J. Inform. Technol. Knowl. Manage. 3(2), 695–698 (2010)
F.J.J. Joseph, Effect of supervised learning methodologies in offline handwritten Thai character recognition. Int. J. Inf. Technol. 12, 57–64 (2020). https://doi.org/10.1007/s41870-019-00366-y
A.B. Nassif, I. Shanin, I. Attili, M. Azzeh, K. Shaalan, Speech recognition using deep neural networks: a systematic review. IEEE Access 7, 19143–19165 (2019)
T. Gerkmann, R.C. Hendriks, Noise power estimation based on the probability of speech presence. In 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (2011), pp. 145–148
J. Psutka, L. Muller, J.V. Psutka, Comparison of MFCC and PLP Parameterizations in the speaker Independent Continuous Speech Recognition Task, Eurospeech 2001, Scandinavia (2001)
L. Xie, Z.Q. Liu, A comparative study of audio features for audio to visual cob version in MPEG-4 compliant facial animation. In Proceedings of ICMLC, Dalian, 13–16 Aug 2006
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Girirajan, S., Pandian, A. (2022). Hybrid Feature Extraction Technique for Tamil Automatic Speech Recognition System in Noisy Environment. In: Pundir, A.K.S., Yadav, N., Sharma, H., Das, S. (eds) Recent Trends in Communication and Intelligent Systems. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-19-1324-2_2
Download citation
DOI: https://doi.org/10.1007/978-981-19-1324-2_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1323-5
Online ISBN: 978-981-19-1324-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)