Skip to main content

A Deep Learning Approach to Speech Recognition of Digits

  • Conference paper
  • First Online:
Advances in Computing and Data Sciences (ICACDS 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1045))

Included in the following conference series:

Abstract

One of the technologies gaining an increasing popularity in recent years has been speech recognition. This technology has a widespread user base ranging from organizations to individuals for the various benefits it provides. Today, there are a great deal of virtual voice assistants in the market- Siri, Cortana and Alexa, to name a few. However, they all require an active internet connection and aren’t supported on all devices. We have built a digit recognition system that works offline on desktop and mobile devices. This speech-to-text system can recognize a sequence of digits spoken between 0 and 9 and distinguish variations such as “double two” and “triple six”. Our approach involves recording a digit sequence audio as input and pre-processing it by extracting the peak amplitudes, followed by Mel Frequency Cepstral Coefficients (MFCC) feature extraction and finally feeding the feature vector to an artificial neural network that outputs the most probable class. We then exported the model to a minimized configuration that is simple to use on mobile platform. We obtained an accuracy of 87% for the validation set and 86% for the test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Forsberg, M.: Why is Speech Recognition Difficult? (2003)

    Google Scholar 

  2. Pawar, G.S., Morade, S.S.: Isolated English language digit recognition using Hidden Markov Model toolkit. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(6) (2014)

    Google Scholar 

  3. Periyavaram, V.: Knowledge Base approach for spoken digit recognition. http://www.csc.villanova.edu/~nlp/pres2/periyavaram.ppt

  4. Somaiya, K.J.: Isolated digit recognition using MFCC AND DTW (2012)

    Google Scholar 

  5. Carnegie Mellon University. CMUSphinx Open Source Speech Recognition, 8 November 2017. https://cmusphinx.github.io/wiki/tutorial/

  6. TensorFlow.org. Simple Audio Recognition, 13 January 2018. https://www.tensorflow.org/tutorials/sequences/audio_recognition

  7. https://keras.io/

  8. https://www.tensorflow.org/

  9. https://developer.android.com/studio

  10. Speech commands dataset version 2 (2018). https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz

  11. https://www.audacityteam.org/

  12. Gomez, R.: Understanding Categorical Cross-Entropy Loss, 23 May 2018. https://gombru.github.io/2018/05/23/cross_entropy_loss/

  13. Niroshan, A.: Step By Step Guide To Run Your Trained Neural Network Model On Android, 29 August 2017. https://medium.com/@nirosh

  14. Agarwal, P.: Deploying a Keras Model on Android, 25 October 2017. https://medium.com/@thepulkitagarwal

  15. EliteDataScience: Overfitting in Machine Learning: What It Is and How to Prevent It, 7 September 2017. https://elitedatascience.com/overfitting-in-machine-learning

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joel Kiran Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gopinath, G., Kumar, J.K., Shetty, N., Shylaja, S.S. (2019). A Deep Learning Approach to Speech Recognition of Digits. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T., Kashyap, R. (eds) Advances in Computing and Data Sciences. ICACDS 2019. Communications in Computer and Information Science, vol 1045. Springer, Singapore. https://doi.org/10.1007/978-981-13-9939-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-9939-8_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-9938-1

  • Online ISBN: 978-981-13-9939-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics