Speech Based Arithmetic Calculator Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models

  • Moula Husain
  • S. M. Meena
  • Manjunath K. Gonal
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 43)


In recent years, speech based computer interaction has become the most challenging and demanding application in the field of human computer interaction. Speech based Human computer interaction offers a more natural way to interact with computers and does not require special training. In this paper, we have made an attempt to build a human computer interaction system by developing speech based arithmetic calculator using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models. The system receives arithmetic expression in the form of isolated speech command words. Acoustic features such as Mel-Frequency Cepstral Coefficients features are extracted from the these speech commands. Mel-Frequency Cepstral features are used to train Gaussian mixture model. The model created after iterative training is used to predict input speech command either as a digit or an operator. After successful recognition of operators and digits, arithmetic expression will be evaluated and result of expression will be converted into an audio wave. Our system is tested with a speech database consisting of single digit numbers (0–9) and 5 basic arithmetic operators \( ( + , - , \times ,/\,{\text{and}}\,\% ) \). The recognition accuracy of the system is around 86 %. Our speech based HCI system can provide a great benefit of interacting with machines through multiple modalities. Also it supports in providing assistance to visually impaired and physically challenged people.


MFCC GMM EM algorithm 


  1. 1.
    Rabiner, L., Juang, B.-H.: Fundamentals of speech recognition. In: Smith, T.F., Waterman, M.S. (eds.) Identification of Common Molecular. Prentice-Hall, Inc., Upper Saddle River (1993)Google Scholar
  2. 2.
    Gouvianakis, N., Xydeas, C.: Advances in analysis by synthesis lpc speech coders. J. Inst. Electron. Radio Eng. 57(6), S272S286 (1987)Google Scholar
  3. 3.
    Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 57(4), 173852 (1990)Google Scholar
  4. 4.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357366 (1980)CrossRefGoogle Scholar
  5. 5.
    Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. IEEE Proc. 77(2), 257286 (1989)CrossRefGoogle Scholar
  6. 6.
    Juang, B., Levinson, S., Sondhi, M.: Maximum likelihood estimation for multivariate mixture observations of markov chains (corresp.). IEEE Trans. Inf. Theory. 32(2), 307–309 (1986)Google Scholar
  7. 7.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. B 39(1), 138 (1977)MathSciNetGoogle Scholar
  8. 8.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (2000)Google Scholar
  9. 9.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York Inc, Secaucus (2006)Google Scholar
  10. 10.
    Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR: Appl. Stat. 28(1), 100108 (1979)Google Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  • Moula Husain
    • 1
  • S. M. Meena
    • 1
  • Manjunath K. Gonal
    • 1
  1. 1.B.V.B College of Engineering and TechnologyHubliIndia

Personalised recommendations