Abstract
One of the technologies gaining an increasing popularity in recent years has been speech recognition. This technology has a widespread user base ranging from organizations to individuals for the various benefits it provides. Today, there are a great deal of virtual voice assistants in the market- Siri, Cortana and Alexa, to name a few. However, they all require an active internet connection and aren’t supported on all devices. We have built a digit recognition system that works offline on desktop and mobile devices. This speech-to-text system can recognize a sequence of digits spoken between 0 and 9 and distinguish variations such as “double two” and “triple six”. Our approach involves recording a digit sequence audio as input and pre-processing it by extracting the peak amplitudes, followed by Mel Frequency Cepstral Coefficients (MFCC) feature extraction and finally feeding the feature vector to an artificial neural network that outputs the most probable class. We then exported the model to a minimized configuration that is simple to use on mobile platform. We obtained an accuracy of 87% for the validation set and 86% for the test set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Forsberg, M.: Why is Speech Recognition Difficult? (2003)
Pawar, G.S., Morade, S.S.: Isolated English language digit recognition using Hidden Markov Model toolkit. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(6) (2014)
Periyavaram, V.: Knowledge Base approach for spoken digit recognition. http://www.csc.villanova.edu/~nlp/pres2/periyavaram.ppt
Somaiya, K.J.: Isolated digit recognition using MFCC AND DTW (2012)
Carnegie Mellon University. CMUSphinx Open Source Speech Recognition, 8 November 2017. https://cmusphinx.github.io/wiki/tutorial/
TensorFlow.org. Simple Audio Recognition, 13 January 2018. https://www.tensorflow.org/tutorials/sequences/audio_recognition
Speech commands dataset version 2 (2018). https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
Gomez, R.: Understanding Categorical Cross-Entropy Loss, 23 May 2018. https://gombru.github.io/2018/05/23/cross_entropy_loss/
Niroshan, A.: Step By Step Guide To Run Your Trained Neural Network Model On Android, 29 August 2017. https://medium.com/@nirosh
Agarwal, P.: Deploying a Keras Model on Android, 25 October 2017. https://medium.com/@thepulkitagarwal
EliteDataScience: Overfitting in Machine Learning: What It Is and How to Prevent It, 7 September 2017. https://elitedatascience.com/overfitting-in-machine-learning
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gopinath, G., Kumar, J.K., Shetty, N., Shylaja, S.S. (2019). A Deep Learning Approach to Speech Recognition of Digits. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T., Kashyap, R. (eds) Advances in Computing and Data Sciences. ICACDS 2019. Communications in Computer and Information Science, vol 1045. Springer, Singapore. https://doi.org/10.1007/978-981-13-9939-8_11
Download citation
DOI: https://doi.org/10.1007/978-981-13-9939-8_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9938-1
Online ISBN: 978-981-13-9939-8
eBook Packages: Computer ScienceComputer Science (R0)