CNN based keyword spotting: An application for context based voiced Odia words

Mohanty, Prithviraj; Nayak, Ajit Kumar

doi:10.1007/s41870-022-00992-z

CNN based keyword spotting: An application for context based voiced Odia words

Original Research
Published: 14 June 2022

Volume 14, pages 3647–3658, (2022)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

Prithviraj Mohanty¹ &
Ajit Kumar Nayak¹

179 Accesses
7 Citations
Explore all metrics

Abstract

In recent times, pre-trained convolutional neural networks (CNN) have outdid traditional automatic speech recognition systems based on Gaussian mixture model (GMM) on a variety of large vocabulary benchmarks. The proposed work presents, Odia keyword recognition using CNN approach. Different parts of a human body in spoken Odia language is considered as the keywords for recognition. MFCC feature extraction technique along with spectrogram representation of voiced keywords are considered as the input to the CNN. The projected CNN model is trained and implemented using a python frame work Keras with Tensorflow as backend. Various performance metrics are considered to compare the proposed model with a fully connected deep neural network model (DNN). A number of experiments are conducted with variation of epochs and split ratio (training-validation-testing) and results are obtained to show the accuracy, loss and other performance metrics of both the models. It is observed that the proposed model outperforms the DNN model. Further the average recognition accuracy of proposed CNN model is analyzed with other approaches like (hidden Markov model) HMM and SVM (support vector machine) model and manifests superior average recognition rate as opposed to considered state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Schalkwyk J, Beeferman D, Beaufays F, Byrne B, Chelba C, Cohen M, Kamvar M, Strope B (2010) Your word is my command: Google search by voice: a case study. Adv Speech Recogn, 61-90
Guoguo C, Parada C, Heigold G (2014) Small-footprint keyword spotting using deep neural networks.IEEE Int Conf Acoust Speech Signal Proces (ICASSP), Florence, 4087-4091
Rohlicek JR, Russell W, Roukos S, Gish H (1990) Continuous hidden Markov modeling for speaker-independent wordspotting. In: Proc Int Conf Acoust Speech Signal Proces (ICASSP). IEEE, 627–630
Rose RC, Paul DB (1990). A hidden Markov model based keyword recognition system. In: Proc Int Conf Acoust Speech Signal Proces (ICASSP). IEEE, 129-132
Wilpon JG, Miller LG, Modi P (1991) Improvements and applications for key word recognition using hidden Markov modeling techniques. In: Proc Int Conf Acoust Speech Signal Processing (ICASSP). IEEE, 309-312
Silaghi MC (2005) Spotting subsequences matching an HMM using the average observation probability criteria with application to keyword spotting. AAAI Vol. 3, 1118-1123
Grangier D, Keshet J, Bengio S (2009) Discriminative keyword spotting. Automatic speech and speaker recognition: large margin and kernel methods,Vol. 51,175-194
Tabibian S, Akbari A, Nasersharif B (2011) An evolutionary based discriminative system for keyword spotting. In 2011 International Symposium on Artificial Intelligence and Signal Processing (AISP), IEEE,83-88
Li KP, Naylor JA, Rossen ML (1992) A whole word recurrent neural network for keyword spotting. In [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing,IEEE, 2:81-84
Fernández S, Graves A, Schmidhuber J (2007) An application of recurrent neural networks to discriminative keyword spotting. In International Conference on Artificial Neural Networks, Springer, Berlin, 220-229
Tóth L (2014) Combining time-and frequency-domain convolution in convolutional neural network-based phone recognition. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE,190-194
Sainath TN, Mohamed AR, Kingsbury B, Ramabhadran B (2013) Deep convolutional neural networks for LVCSR. In 2013 IEEE international conference on acoustics, speech and signal processing, IEEE, 8614-8618
Li X, Zhou Z (2017) Speech Command Recognition with Convolutional Neural Network. CS229 Stanford education
Warden P (2018) Speech commands: a dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209
Jufarsky D, Martin JH, Jufarsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, Prentice Hall series in artificial intelligence.
Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol 1(6):1–4
Google Scholar
LeCun Y, Bengio Y (1998) Convolutional networks for images, speech, and time series, The handbook of brain theory and neural networks
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, p.1097-1105
Zhang X, Trmal J, Povey D, Khudanpur S (2014) Improving deep neural network acoustic models using generalized maxout networks.In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, p.215-219
Zhang Y, Pezeshki M, Brakel P, Zhang S, Bengio CLY, Courville A (2017) Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:1701.02720
Sainath TN, Parada C (2015) Convolutional neural networks for small-footprint keyword spotting. In Sixteenth Annual Conference of the International Speech Communication Association
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics, 33(1): 159-74. PMID: 843571159-174
Mohanty P, Nayak AK (2018) Isolated Odia digit recognition using HTK: an implementation view. In 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA), IEEE, 30-35
Mohanty P, Nayak AK (2019) Multi-class support vector machine based continuous voiced Odia numerals recognition. Int J Sci Technol Res 8(10):2754–2764
Google Scholar
Mohapatra H, Panda Rath AK, N, (2022) IoT infrastructure for the accident avoidance: an approach of smart transportation. Int J Inf Tecnol. https://doi.org/10.1007/s41870-022-00872-6
Mohanty P, Sahoo JP, Nayak AK (2022) Voiced Odia digit recognition using convolutional neural network. Advances in distributed computing and machine learning. Lecture Notes in Networks and Systems, vol 302. Springer, Singapore. https://doi.org/10.1007/978-981-16-4807-6_16
Rusia MK, Singh DK (2021) An efficient CNN approach for facial expression recognition with some measures of overfitting. Int J Inf Tecnol 13:2419–2430. https://doi.org/10.1007/s41870-021-00803-x
Article Google Scholar
de Coimbra AD, Sabato L, Viana Martin Loesener Da S, Christoph B (2018) A neural attention model for speech command recognition, arXiv preprint arXiv:1808.08929
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Proces Syst,Vol. 30
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Amodei D, Ananthanarayanan S, Anubhai R, Bai J, Battenberg E, Case C, Casper J, Catanzaro B, Cheng Q, Chen G, Chen J (2016) PMLR. End-to-end speech recognition in English and Mandarin, In International conference on machine learning, pp 173–182
Tang Raphael, Lin Jimmy (2018) Deep residual learning for small-footprint keyword spotting, In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), p.5484-5488
Arik S, Kliegl M, Child R, Hestness J, Gibiansky A, Fougner C, Prenger R, Coates A (2020) Convolutional recurrent neural networks for small-footprint keyword spotting. U.S. Patent 10,540,961, Baidu USA LLC
Zhang Y, Suda N, Lai L, Chandra V (2017) Hello edge: Kkeyword spotting on microcontrollers, arXiv preprint arXiv:1711.07128

Download references

Author information

Authors and Affiliations

Department of CSIT, ITER, S’O’A (Deemed to be) University, Bhubaneswar, Odisha, India
Prithviraj Mohanty & Ajit Kumar Nayak

Authors

Prithviraj Mohanty
View author publications
You can also search for this author in PubMed Google Scholar
Ajit Kumar Nayak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prithviraj Mohanty.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohanty, P., Nayak, A.K. CNN based keyword spotting: An application for context based voiced Odia words. Int. j. inf. tecnol. 14, 3647–3658 (2022). https://doi.org/10.1007/s41870-022-00992-z

Download citation

Received: 19 February 2022
Accepted: 28 April 2022
Published: 14 June 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s41870-022-00992-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CNN based keyword spotting: An application for context based voiced Odia words

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A Deep Learning Framework for Audio Deepfake Detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CNN based keyword spotting: An application for context based voiced Odia words

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Automatic speech recognition: a survey

A Deep Learning Framework for Audio Deepfake Detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation