Abstract
In recent years, speech based computer interaction has become the most challenging and demanding application in the field of human computer interaction. Speech based Human computer interaction offers a more natural way to interact with computers and does not require special training. In this paper, we have made an attempt to build a human computer interaction system by developing speech based arithmetic calculator using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models. The system receives arithmetic expression in the form of isolated speech command words. Acoustic features such as Mel-Frequency Cepstral Coefficients features are extracted from the these speech commands. Mel-Frequency Cepstral features are used to train Gaussian mixture model. The model created after iterative training is used to predict input speech command either as a digit or an operator. After successful recognition of operators and digits, arithmetic expression will be evaluated and result of expression will be converted into an audio wave. Our system is tested with a speech database consisting of single digit numbers (0–9) and 5 basic arithmetic operators \( ( + , - , \times ,/\,{\text{and}}\,\% ) \). The recognition accuracy of the system is around 86 %. Our speech based HCI system can provide a great benefit of interacting with machines through multiple modalities. Also it supports in providing assistance to visually impaired and physically challenged people.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rabiner, L., Juang, B.-H.: Fundamentals of speech recognition. In: Smith, T.F., Waterman, M.S. (eds.) Identification of Common Molecular. Prentice-Hall, Inc., Upper Saddle River (1993)
Gouvianakis, N., Xydeas, C.: Advances in analysis by synthesis lpc speech coders. J. Inst. Electron. Radio Eng. 57(6), S272S286 (1987)
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 57(4), 173852 (1990)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357366 (1980)
Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. IEEE Proc. 77(2), 257286 (1989)
Juang, B., Levinson, S., Sondhi, M.: Maximum likelihood estimation for multivariate mixture observations of markov chains (corresp.). IEEE Trans. Inf. Theory. 32(2), 307–309 (1986)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. B 39(1), 138 (1977)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (2000)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York Inc, Secaucus (2006)
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR: Appl. Stat. 28(1), 100108 (1979)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Husain, M., Meena, S.M., Gonal, M.K. (2016). Speech Based Arithmetic Calculator Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models. In: Nagar, A., Mohapatra, D., Chaki, N. (eds) Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. Smart Innovation, Systems and Technologies, vol 43. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2538-6_22
Download citation
DOI: https://doi.org/10.1007/978-81-322-2538-6_22
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2537-9
Online ISBN: 978-81-322-2538-6
eBook Packages: EngineeringEngineering (R0)