Speech Based Arithmetic Calculator Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models

Husain, Moula; Meena, S. M.; Gonal, Manjunath K.

doi:10.1007/978-81-322-2538-6_22

Moula Husain⁶,
S. M. Meena⁶ &
Manjunath K. Gonal⁶

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 43))

1032 Accesses

Abstract

In recent years, speech based computer interaction has become the most challenging and demanding application in the field of human computer interaction. Speech based Human computer interaction offers a more natural way to interact with computers and does not require special training. In this paper, we have made an attempt to build a human computer interaction system by developing speech based arithmetic calculator using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models. The system receives arithmetic expression in the form of isolated speech command words. Acoustic features such as Mel-Frequency Cepstral Coefficients features are extracted from the these speech commands. Mel-Frequency Cepstral features are used to train Gaussian mixture model. The model created after iterative training is used to predict input speech command either as a digit or an operator. After successful recognition of operators and digits, arithmetic expression will be evaluated and result of expression will be converted into an audio wave. Our system is tested with a speech database consisting of single digit numbers (0–9) and 5 basic arithmetic operators \( ( + , - , \times ,/\,{\text{and}}\,\% ) \). The recognition accuracy of the system is around 86 %. Our speech based HCI system can provide a great benefit of interacting with machines through multiple modalities. Also it supports in providing assistance to visually impaired and physically challenged people.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rabiner, L., Juang, B.-H.: Fundamentals of speech recognition. In: Smith, T.F., Waterman, M.S. (eds.) Identification of Common Molecular. Prentice-Hall, Inc., Upper Saddle River (1993)
Google Scholar
Gouvianakis, N., Xydeas, C.: Advances in analysis by synthesis lpc speech coders. J. Inst. Electron. Radio Eng. 57(6), S272S286 (1987)
Google Scholar
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 57(4), 173852 (1990)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357366 (1980)
Article Google Scholar
Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. IEEE Proc. 77(2), 257286 (1989)
Article Google Scholar
Juang, B., Levinson, S., Sondhi, M.: Maximum likelihood estimation for multivariate mixture observations of markov chains (corresp.). IEEE Trans. Inf. Theory. 32(2), 307–309 (1986)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. B 39(1), 138 (1977)
MathSciNet Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (2000)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York Inc, Secaucus (2006)
Google Scholar
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR: Appl. Stat. 28(1), 100108 (1979)
Google Scholar

Download references

Author information

Authors and Affiliations

B.V.B College of Engineering and Technology, Vidyanagar, Hubli, 580031, Karnataka, India
Moula Husain, S. M. Meena & Manjunath K. Gonal

Authors

Moula Husain
View author publications
You can also search for this author in PubMed Google Scholar
S. M. Meena
View author publications
You can also search for this author in PubMed Google Scholar
Manjunath K. Gonal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moula Husain .

Editor information

Editors and Affiliations

Department of Computer Science, Liverpool Hope University, Liverpool, United Kingdom
Atulya Nagar
Dept. of Computer Science and Engineering, National Institute of Technology Rourkela, Rourkela, Odisha, India
Durga Prasad Mohapatra
Computer Science & Engineering, University of Calcutta, Kolkata, West Bengal, India
Nabendu Chaki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Husain, M., Meena, S.M., Gonal, M.K. (2016). Speech Based Arithmetic Calculator Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models. In: Nagar, A., Mohapatra, D., Chaki, N. (eds) Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. Smart Innovation, Systems and Technologies, vol 43. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2538-6_22

Download citation

DOI: https://doi.org/10.1007/978-81-322-2538-6_22
Published: 08 October 2015
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2537-9
Online ISBN: 978-81-322-2538-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics