Abstract
Automatic Speech Recognition (ASR) is defined as a computer-driven transcription of the spoken word into readable text. The main aim of ASR technology is to correctly identify the words spoken by a person. In this paper, we are using the clustering method in ASR to identify the digit spoken by the speaker. The importance of ASR is to allow a computer to recognize the words that are spoken by any human being independent of vocabulary size, noise, speaker characteristics or accent. This paper introduces ASR using clustering techniques to recognize the English digits. The clustering algorithms used here are K-means and Gaussian Expectation Maximization (GEM). We use Mel-frequency cepstral coefficients (MFCCs) for extracting features from speech. Performance is calculated for an individual digit using the hard threshold technique. This paper also compares the result using two different clustering techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
S.B. Davis, P. Mermelstein, Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357–365 (1980)
B. Soni, S. Debnath, P.K. Das, Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization. Int. J. Speech Technol. 19(3), 525–536 (2016)
T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, A.Y. Wu, An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
J. Yadav, M. Sharma, A review of K-mean algorithm. Int. J. Eng. Trends Technol. (IJETT) 4(7), 2972–2976 (2013)
M. Nadif, G. Govaert, Block clustering via the block GEM and two-way EM algorithms, in The 3rd ACS/IEEE International Conference on Computer Systems and Applications (2005)
V.-E. Neagoe, V. Chirila-Berbentea, Improved Gaussian mixture model with expectation-maximization for clustering of remote sensing imagery, in IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (2016)
A. Farhat, D. O’Shaughnessy, The use of a distribution-clustering technique in HMM-based continuous-speech recognition, in Canadian Conference on Electrical and Computer Engineering (IEEE Xplore, 1995). https://doi.org/10.1109/ccece.1995.526598
A. Revathi, Y. Venkataramani, Perceptual features based isolated digit and continuous speech recognition using iterative clustering approach networks and communications, in First International Conference on Networks & Communications (IEEE Xplore). https://doi.org/10.1109/netcom.2009.32
L. Lazli, M. Boukadoum, HMM/MLP speech recognition system using a novel data clustering approach, in Electrical and Computer Engineering (CCECE), 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) (IEEE Xplore). https://doi.org/10.1109/ccece.2017.7946644
N. Thatphithakkul, B. Kruatrachue, C. Wutiwiwatchai, Robust speech recognition using noise-cluster HMM interpolation, in ICSP 2008. 9th International Conference on Signal Processing (IEEE Xplore). https://doi.org/10.1109/icosp.2008.4697203
J. Fritsch, M. Finke, A. Waibel, Context-dependent hybrid HME/HMM speech recognition using polyphone clustering decision trees Acoust. Speech Sig. Process. (1997) ICASSP-97. IEEE xplore
P. Saini, P. Kaur, Automatic speech recognition: a review, in 2013 International Journal of Engineering Trends and Technology
I. Patel Dr. Y. Srinivasa Rao, Speech recognition using hidden Markov model with MFCC—Subband Technique, in 2010 International Conference on Recent Trends in Information, Telecommunication and Computing
X.-g. Li, M.-f. Yao, W.-t. Huang, Speech recognition based on K-means clustering and neural network ensembles, in 2011 Seventh International Conference on Natural Computation
D.B. Hanchate, M. Nalawade, M. Pawar, V. Pophale, P.K. Maurya, Vocal digit recognition using artificial neural network, in 2010 2nd International Conference on Computer Engineering and Technology
A. Thalengala, K. Shama, Study of sub-word acoustical models for Kannada isolated word recognition system. Int. J. Speech Technol. 19, 817–826 (2016). https://doi.org/10.1007/s10772-016-9374-0
S. Swamy, K.V Ramakrishnan, An efficient speech recognition system. Comput. Sci. Eng. Int. J. 3, 21 (2013)
Acknowledgements
The author(s) acknowledge the traveling support of Technical Education Quality Improvement Programme (TEQIP-III) of the Government of India for attending the conference.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Debnath, S., Roy, P. (2020). Automatic Speech Recognition Based on Clustering Technique. In: Mandal, J., Bhattacharya, D. (eds) Emerging Technology in Modelling and Graphics. Advances in Intelligent Systems and Computing, vol 937. Springer, Singapore. https://doi.org/10.1007/978-981-13-7403-6_59
Download citation
DOI: https://doi.org/10.1007/978-981-13-7403-6_59
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7402-9
Online ISBN: 978-981-13-7403-6
eBook Packages: EngineeringEngineering (R0)