Skip to main content
Log in

CNN based keyword spotting: An application for context based voiced Odia words

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

In recent times, pre-trained convolutional neural networks (CNN) have outdid traditional automatic speech recognition systems based on Gaussian mixture model (GMM) on a variety of large vocabulary benchmarks. The proposed work presents, Odia keyword recognition using CNN approach. Different parts of a human body in spoken Odia language is considered as the keywords for recognition. MFCC feature extraction technique along with spectrogram representation of voiced keywords are considered as the input to the CNN. The projected CNN model is trained and implemented using a python frame work Keras with Tensorflow as backend. Various performance metrics are considered to compare the proposed model with a fully connected deep neural network model (DNN). A number of experiments are conducted with variation of epochs and split ratio (training-validation-testing) and results are obtained to show the accuracy, loss and other performance metrics of both the models. It is observed that the proposed model outperforms the DNN model. Further the average recognition accuracy of proposed CNN model is analyzed with other approaches like (hidden Markov model) HMM and SVM (support vector machine) model and manifests superior average recognition rate as opposed to considered state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  1. Schalkwyk J, Beeferman D, Beaufays F, Byrne B, Chelba C, Cohen M, Kamvar M, Strope B (2010) Your word is my command: Google search by voice: a case study. Adv Speech Recogn, 61-90

  2. Guoguo C, Parada C, Heigold G (2014) Small-footprint keyword spotting using deep neural networks.IEEE Int Conf Acoust Speech Signal Proces (ICASSP), Florence, 4087-4091

  3. Rohlicek JR, Russell W, Roukos S, Gish H (1990) Continuous hidden Markov modeling for speaker-independent wordspotting. In: Proc Int Conf Acoust Speech Signal Proces (ICASSP). IEEE, 627–630

  4. Rose RC, Paul DB (1990). A hidden Markov model based keyword recognition system. In: Proc Int Conf Acoust Speech Signal Proces (ICASSP). IEEE, 129-132

  5. Wilpon JG, Miller LG, Modi P (1991) Improvements and applications for key word recognition using hidden Markov modeling techniques. In: Proc Int Conf Acoust Speech Signal Processing (ICASSP). IEEE, 309-312

  6. Silaghi MC (2005) Spotting subsequences matching an HMM using the average observation probability criteria with application to keyword spotting. AAAI Vol. 3, 1118-1123

  7. Grangier D, Keshet J, Bengio S (2009) Discriminative keyword spotting. Automatic speech and speaker recognition: large margin and kernel methods,Vol. 51,175-194

  8. Tabibian S, Akbari A, Nasersharif B (2011) An evolutionary based discriminative system for keyword spotting. In 2011 International Symposium on Artificial Intelligence and Signal Processing (AISP), IEEE,83-88

  9. Li KP, Naylor JA, Rossen ML (1992) A whole word recurrent neural network for keyword spotting. In [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing,IEEE, 2:81-84

  10. Fernández S, Graves A, Schmidhuber J (2007) An application of recurrent neural networks to discriminative keyword spotting. In International Conference on Artificial Neural Networks, Springer, Berlin, 220-229

  11. Tóth L (2014) Combining time-and frequency-domain convolution in convolutional neural network-based phone recognition. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE,190-194

  12. Sainath TN, Mohamed AR, Kingsbury B, Ramabhadran B (2013) Deep convolutional neural networks for LVCSR. In 2013 IEEE international conference on acoustics, speech and signal processing, IEEE, 8614-8618

  13. Li X, Zhou Z (2017) Speech Command Recognition with Convolutional Neural Network. CS229 Stanford education

  14. Warden P (2018) Speech commands: a dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209

  15. Jufarsky D, Martin JH, Jufarsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, Prentice Hall series in artificial intelligence.

  16. Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol 1(6):1–4

    Google Scholar 

  17. LeCun Y, Bengio Y (1998) Convolutional networks for images, speech, and time series, The handbook of brain theory and neural networks

  18. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, p.1097-1105

  19. Zhang X, Trmal J, Povey D, Khudanpur S (2014) Improving deep neural network acoustic models using generalized maxout networks.In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, p.215-219

  20. Zhang Y, Pezeshki M, Brakel P, Zhang S, Bengio CLY, Courville A (2017) Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:1701.02720

  21. Sainath TN, Parada C (2015) Convolutional neural networks for small-footprint keyword spotting. In Sixteenth Annual Conference of the International Speech Communication Association

  22. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580

  23. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics, 33(1): 159-74. PMID: 843571159-174

  24. Mohanty P, Nayak AK (2018) Isolated Odia digit recognition using HTK: an implementation view. In 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA), IEEE, 30-35

  25. Mohanty P, Nayak AK (2019) Multi-class support vector machine based continuous voiced Odia numerals recognition. Int J Sci Technol Res 8(10):2754–2764

    Google Scholar 

  26. Mohapatra H, Panda Rath AK, N, (2022) IoT infrastructure for the accident avoidance: an approach of smart transportation. Int J Inf Tecnol. https://doi.org/10.1007/s41870-022-00872-6

  27. Mohanty P, Sahoo JP, Nayak AK (2022) Voiced Odia digit recognition using convolutional neural network. Advances in distributed computing and machine learning. Lecture Notes in Networks and Systems, vol 302. Springer, Singapore. https://doi.org/10.1007/978-981-16-4807-6_16

  28. Rusia MK, Singh DK (2021) An efficient CNN approach for facial expression recognition with some measures of overfitting. Int J Inf Tecnol 13:2419–2430. https://doi.org/10.1007/s41870-021-00803-x

    Article  Google Scholar 

  29. de Coimbra AD, Sabato L, Viana Martin Loesener Da S, Christoph B (2018) A neural attention model for speech command recognition, arXiv preprint arXiv:1808.08929

  30. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Proces Syst,Vol. 30

  31. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  32. Amodei D, Ananthanarayanan S, Anubhai R, Bai J, Battenberg E, Case C, Casper J, Catanzaro B, Cheng Q, Chen G, Chen J (2016) PMLR. End-to-end speech recognition in English and Mandarin, In International conference on machine learning, pp 173–182

  33. Tang Raphael, Lin Jimmy (2018) Deep residual learning for small-footprint keyword spotting, In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), p.5484-5488

  34. Arik S, Kliegl M, Child R, Hestness J, Gibiansky A, Fougner C, Prenger R, Coates A (2020) Convolutional recurrent neural networks for small-footprint keyword spotting. U.S. Patent 10,540,961, Baidu USA LLC

  35. Zhang Y, Suda N, Lai L, Chandra V (2017) Hello edge: Kkeyword spotting on microcontrollers, arXiv preprint arXiv:1711.07128

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prithviraj Mohanty.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohanty, P., Nayak, A.K. CNN based keyword spotting: An application for context based voiced Odia words. Int. j. inf. tecnol. 14, 3647–3658 (2022). https://doi.org/10.1007/s41870-022-00992-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-022-00992-z

Keywords

Navigation