A Deep Learning Approach to Speech Recognition of Digits

Gopinath, Gagan; Kumar, Joel Kiran; Shetty, Nirmit; Shylaja, S. S.

doi:10.1007/978-981-13-9939-8_11

Gagan Gopinath¹³,
Joel Kiran Kumar¹³,
Nirmit Shetty¹³ &
…
S. S. Shylaja¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1045))

Included in the following conference series:

International Conference on Advances in Computing and Data Sciences

981 Accesses
1 Altmetric

Abstract

One of the technologies gaining an increasing popularity in recent years has been speech recognition. This technology has a widespread user base ranging from organizations to individuals for the various benefits it provides. Today, there are a great deal of virtual voice assistants in the market- Siri, Cortana and Alexa, to name a few. However, they all require an active internet connection and aren’t supported on all devices. We have built a digit recognition system that works offline on desktop and mobile devices. This speech-to-text system can recognize a sequence of digits spoken between 0 and 9 and distinguish variations such as “double two” and “triple six”. Our approach involves recording a digit sequence audio as input and pre-processing it by extracting the peak amplitudes, followed by Mel Frequency Cepstral Coefficients (MFCC) feature extraction and finally feeding the feature vector to an artificial neural network that outputs the most probable class. We then exported the model to a minimized configuration that is simple to use on mobile platform. We obtained an accuracy of 87% for the validation set and 86% for the test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Forsberg, M.: Why is Speech Recognition Difficult? (2003)
Google Scholar
Pawar, G.S., Morade, S.S.: Isolated English language digit recognition using Hidden Markov Model toolkit. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(6) (2014)
Google Scholar
Periyavaram, V.: Knowledge Base approach for spoken digit recognition. http://www.csc.villanova.edu/~nlp/pres2/periyavaram.ppt
Somaiya, K.J.: Isolated digit recognition using MFCC AND DTW (2012)
Google Scholar
Carnegie Mellon University. CMUSphinx Open Source Speech Recognition, 8 November 2017. https://cmusphinx.github.io/wiki/tutorial/
TensorFlow.org. Simple Audio Recognition, 13 January 2018. https://www.tensorflow.org/tutorials/sequences/audio_recognition
https://keras.io/
https://www.tensorflow.org/
https://developer.android.com/studio
Speech commands dataset version 2 (2018). https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
https://www.audacityteam.org/
Gomez, R.: Understanding Categorical Cross-Entropy Loss, 23 May 2018. https://gombru.github.io/2018/05/23/cross_entropy_loss/
Niroshan, A.: Step By Step Guide To Run Your Trained Neural Network Model On Android, 29 August 2017. https://medium.com/@nirosh
Agarwal, P.: Deploying a Keras Model on Android, 25 October 2017. https://medium.com/@thepulkitagarwal
EliteDataScience: Overfitting in Machine Learning: What It Is and How to Prevent It, 7 September 2017. https://elitedatascience.com/overfitting-in-machine-learning

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, PES University, Bengaluru, Karnataka, India
Gagan Gopinath, Joel Kiran Kumar, Nirmit Shetty & S. S. Shylaja

Authors

Gagan Gopinath
View author publications
You can also search for this author in PubMed Google Scholar
Joel Kiran Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Nirmit Shetty
View author publications
You can also search for this author in PubMed Google Scholar
S. S. Shylaja
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joel Kiran Kumar .

Editor information

Editors and Affiliations

University of KwaZulu-Natal, Durban, South Africa
Mayank Singh
Computer Science and Engineering, Jaypee Institute of Information Technology, Waknaghat, Himachal Pradesh, India
P.K. Gupta
Department of Computer Science and Engineering, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India
Vipin Tyagi
ÚTIA AV ČR, Institute of Information Theory and Automation, Prague 8, Praha, Czech Republic
Jan Flusser
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
Tuncer Ören
CSE Department, Inderprastha Engineering College, Ghaziabad, Uttar Pradesh, India
Rekha Kashyap

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gopinath, G., Kumar, J.K., Shetty, N., Shylaja, S.S. (2019). A Deep Learning Approach to Speech Recognition of Digits. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T., Kashyap, R. (eds) Advances in Computing and Data Sciences. ICACDS 2019. Communications in Computer and Information Science, vol 1045. Springer, Singapore. https://doi.org/10.1007/978-981-13-9939-8_11

Download citation

DOI: https://doi.org/10.1007/978-981-13-9939-8_11
Published: 20 July 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9938-1
Online ISBN: 978-981-13-9939-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics