Automatic Speech Recognition Based on Clustering Technique

Debnath, Saswati; Roy, Pinki

doi:10.1007/978-981-13-7403-6_59

Saswati Debnath¹⁶ &
Pinki Roy¹⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 937))

2775 Accesses
1 Citations

Abstract

Automatic Speech Recognition (ASR) is defined as a computer-driven transcription of the spoken word into readable text. The main aim of ASR technology is to correctly identify the words spoken by a person. In this paper, we are using the clustering method in ASR to identify the digit spoken by the speaker. The importance of ASR is to allow a computer to recognize the words that are spoken by any human being independent of vocabulary size, noise, speaker characteristics or accent. This paper introduces ASR using clustering techniques to recognize the English digits. The clustering algorithms used here are K-means and Gaussian Expectation Maximization (GEM). We use Mel-frequency cepstral coefficients (MFCCs) for extracting features from speech. Performance is calculated for an individual digit using the hard threshold technique. This paper also compares the result using two different clustering techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

S.B. Davis, P. Mermelstein, Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357–365 (1980)
Article Google Scholar
B. Soni, S. Debnath, P.K. Das, Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization. Int. J. Speech Technol. 19(3), 525–536 (2016)
Article Google Scholar
T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, A.Y. Wu, An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Article Google Scholar
J. Yadav, M. Sharma, A review of K-mean algorithm. Int. J. Eng. Trends Technol. (IJETT) 4(7), 2972–2976 (2013)
Google Scholar
M. Nadif, G. Govaert, Block clustering via the block GEM and two-way EM algorithms, in The 3rd ACS/IEEE International Conference on Computer Systems and Applications (2005)
Google Scholar
V.-E. Neagoe, V. Chirila-Berbentea, Improved Gaussian mixture model with expectation-maximization for clustering of remote sensing imagery, in IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (2016)
Google Scholar
A. Farhat, D. O’Shaughnessy, The use of a distribution-clustering technique in HMM-based continuous-speech recognition, in Canadian Conference on Electrical and Computer Engineering (IEEE Xplore, 1995). https://doi.org/10.1109/ccece.1995.526598
A. Revathi, Y. Venkataramani, Perceptual features based isolated digit and continuous speech recognition using iterative clustering approach networks and communications, in First International Conference on Networks & Communications (IEEE Xplore). https://doi.org/10.1109/netcom.2009.32
L. Lazli, M. Boukadoum, HMM/MLP speech recognition system using a novel data clustering approach, in Electrical and Computer Engineering (CCECE), 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) (IEEE Xplore). https://doi.org/10.1109/ccece.2017.7946644
N. Thatphithakkul, B. Kruatrachue, C. Wutiwiwatchai, Robust speech recognition using noise-cluster HMM interpolation, in ICSP 2008. 9th International Conference on Signal Processing (IEEE Xplore). https://doi.org/10.1109/icosp.2008.4697203
J. Fritsch, M. Finke, A. Waibel, Context-dependent hybrid HME/HMM speech recognition using polyphone clustering decision trees Acoust. Speech Sig. Process. (1997) ICASSP-97. IEEE xplore
Google Scholar
P. Saini, P. Kaur, Automatic speech recognition: a review, in 2013 International Journal of Engineering Trends and Technology
Google Scholar
I. Patel Dr. Y. Srinivasa Rao, Speech recognition using hidden Markov model with MFCC—Subband Technique, in 2010 International Conference on Recent Trends in Information, Telecommunication and Computing
Google Scholar
X.-g. Li, M.-f. Yao, W.-t. Huang, Speech recognition based on K-means clustering and neural network ensembles, in 2011 Seventh International Conference on Natural Computation
Google Scholar
D.B. Hanchate, M. Nalawade, M. Pawar, V. Pophale, P.K. Maurya, Vocal digit recognition using artificial neural network, in 2010 2nd International Conference on Computer Engineering and Technology
Google Scholar
A. Thalengala, K. Shama, Study of sub-word acoustical models for Kannada isolated word recognition system. Int. J. Speech Technol. 19, 817–826 (2016). https://doi.org/10.1007/s10772-016-9374-0
Article Google Scholar
S. Swamy, K.V Ramakrishnan, An efficient speech recognition system. Comput. Sci. Eng. Int. J. 3, 21 (2013)
Google Scholar
http://www.iitg.ernet.in/pkdas/digits.rar

Download references

Acknowledgements

The author(s) acknowledge the traveling support of Technical Education Quality Improvement Programme (TEQIP-III) of the Government of India for attending the conference.

Author information

Authors and Affiliations

Computer Science and Engineering Department, NIT Silchar, Silchar, Assam, India
Saswati Debnath & Pinki Roy

Authors

Saswati Debnath
View author publications
You can also search for this author in PubMed Google Scholar
Pinki Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saswati Debnath .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
Department of Computer Science and Engineering, Institute of Engineering and Management, Kolkata, West Bengal, India
Debika Bhattacharya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Debnath, S., Roy, P. (2020). Automatic Speech Recognition Based on Clustering Technique. In: Mandal, J., Bhattacharya, D. (eds) Emerging Technology in Modelling and Graphics. Advances in Intelligent Systems and Computing, vol 937. Springer, Singapore. https://doi.org/10.1007/978-981-13-7403-6_59

Download citation

DOI: https://doi.org/10.1007/978-981-13-7403-6_59
Published: 17 July 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7402-9
Online ISBN: 978-981-13-7403-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics