Skip to main content

Speech Based Arithmetic Calculator Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models

  • Conference paper
  • First Online:
Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 43))

  • 1032 Accesses

Abstract

In recent years, speech based computer interaction has become the most challenging and demanding application in the field of human computer interaction. Speech based Human computer interaction offers a more natural way to interact with computers and does not require special training. In this paper, we have made an attempt to build a human computer interaction system by developing speech based arithmetic calculator using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models. The system receives arithmetic expression in the form of isolated speech command words. Acoustic features such as Mel-Frequency Cepstral Coefficients features are extracted from the these speech commands. Mel-Frequency Cepstral features are used to train Gaussian mixture model. The model created after iterative training is used to predict input speech command either as a digit or an operator. After successful recognition of operators and digits, arithmetic expression will be evaluated and result of expression will be converted into an audio wave. Our system is tested with a speech database consisting of single digit numbers (0–9) and 5 basic arithmetic operators \( ( + , - , \times ,/\,{\text{and}}\,\% ) \). The recognition accuracy of the system is around 86 %. Our speech based HCI system can provide a great benefit of interacting with machines through multiple modalities. Also it supports in providing assistance to visually impaired and physically challenged people.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rabiner, L., Juang, B.-H.: Fundamentals of speech recognition. In: Smith, T.F., Waterman, M.S. (eds.) Identification of Common Molecular. Prentice-Hall, Inc., Upper Saddle River (1993)

    Google Scholar 

  2. Gouvianakis, N., Xydeas, C.: Advances in analysis by synthesis lpc speech coders. J. Inst. Electron. Radio Eng. 57(6), S272S286 (1987)

    Google Scholar 

  3. Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 57(4), 173852 (1990)

    Google Scholar 

  4. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357366 (1980)

    Article  Google Scholar 

  5. Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. IEEE Proc. 77(2), 257286 (1989)

    Article  Google Scholar 

  6. Juang, B., Levinson, S., Sondhi, M.: Maximum likelihood estimation for multivariate mixture observations of markov chains (corresp.). IEEE Trans. Inf. Theory. 32(2), 307–309 (1986)

    Google Scholar 

  7. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. B 39(1), 138 (1977)

    MathSciNet  Google Scholar 

  8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (2000)

    Google Scholar 

  9. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York Inc, Secaucus (2006)

    Google Scholar 

  10. Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. JSTOR: Appl. Stat. 28(1), 100108 (1979)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moula Husain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer India

About this paper

Cite this paper

Husain, M., Meena, S.M., Gonal, M.K. (2016). Speech Based Arithmetic Calculator Using Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models. In: Nagar, A., Mohapatra, D., Chaki, N. (eds) Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics. Smart Innovation, Systems and Technologies, vol 43. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2538-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2538-6_22

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2537-9

  • Online ISBN: 978-81-322-2538-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics