Advertisement

International Journal of Speech Technology

, Volume 22, Issue 4, pp 1123–1133 | Cite as

PSO-based optimized CNN for Hindi ASR

  • Vishal Passricha
  • Rajesh Kumar AggarwalEmail author
Article
  • 13 Downloads

Abstract

Convolutional Neural Network (CNN) is one of the successful deep learning algorithms that have shown its effectiveness in a variety of vision tasks. The performance of this network depends directly on its hyperparameters. Although, designing CNN architectures require expert knowledge of their intrinsic structure or a lot of trial and error. To overcome these issues, there is a need to automatically design the optimal architecture of CNNs without any human intervention. So, we try to eliminate the constraints on the number of convolutional layers and pooling layers and their type etc. from traditional architecture. Biologically inspired approaches have not been extensively exploited for this task. This paper attempts to automatically optimize CNN architecture’s hyperparameters for speech recognition task based on particle swarm optimization (PSO) which is a population based stochastic optimization technique. The proposed method is evaluated by designing CNN architecture for speech recognition task on Hindi dataset. The experimental results show that the proposed method significantly designs the competitive CNN architecture which performs similar as other state-of-the-art methods.

Keywords

CNN Hyperparameter selection PSO Optimization 

Notes

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing,22(10), 1533–1545.  https://doi.org/10.1109/taslp.2014.2339736.CrossRefGoogle Scholar
  2. Baker, B., Gupta, O., Naik, N., & Raskar, R. (2016). Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167.
  3. Bengio, Y. (2000). Gradient-based optimization of hyperparameters. Neural Computation,12(8), 1889–1900.MathSciNetCrossRefGoogle Scholar
  4. Bergstra, J. S., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. Paper presented at the Advances in Neural Information Processing Systems (NIPS).Google Scholar
  5. Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research,13, 281–305.MathSciNetzbMATHGoogle Scholar
  6. Chen, X., Liu, X., Qian, Y., Gales, M. J., & Woodland, P. C. (2016). CUED-RNNLMAn open-source toolkit for efficient training and evaluation of recurrent neural network language models. Paper presented at the 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP).Google Scholar
  7. Clune, J., Stanley, K. O., Pennock, R. T., & Ofria, C. (2011). On the performance of indirect encoding across the continuum of regularity. IEEE Transactions on Evolutionary Computation,15(3), 346–367.CrossRefGoogle Scholar
  8. Eberhart, R., & Kennedy, J. (1995). A new optimizer using particle swarm theory. Paper presented at the Sixth International Symposium on Micro Machine and Human Science MHS’95Google Scholar
  9. Fernando, C., Banarse, D., Reynolds, M., Besse, F., Pfau, D., Jaderberg, M., et al. (2016). Convolution by evolution: Differentiable pattern producing networks. Paper presented at the Proceedings of the Genetic and Evolutionary Computation Conference 2016.Google Scholar
  10. Fujimoto, M. (2017). Factored Deep Convolutional Neural Networks for Noise Robust Speech Recognition. Paper presented at the Interspeech 2017. http://dx.doi.org/10.21437/interspeech.2017-225
  11. Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. Paper presented at the Thirteenth International Conference on Artificial Intelligence and Statistics.Google Scholar
  12. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.Google Scholar
  13. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-R., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine,29(6), 82–97.CrossRefGoogle Scholar
  14. Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. Paper presented at the International Conference on Learning and Intelligent Optimization.Google Scholar
  15. Kassahun, Y., Edgington, M., Metzen, J. H., Sommer, G., & Kirchner, F. (2007). A common genetic encoding for both direct and indirect encodings of networks. Paper presented at the 9th annual conference on Genetic and evolutionary computation.Google Scholar
  16. Kingma, D., & Ba, J. (2014). Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, 15.
  17. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Paper presented at the Advances in Neural Information Processing Systems (NIPS).Google Scholar
  18. Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. Paper presented at the Proceedings of the 24th international conference on Machine learning.Google Scholar
  19. LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks,3361(10), 1995.Google Scholar
  20. Lee, K.-F., & Hon, H.-W. (1989). Speaker-independent phone recognition using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing,37(11), 1641–1648.CrossRefGoogle Scholar
  21. Loshchilov, I., & Hutter, F. (2016). CMA-ES for hyperparameter optimization of deep neural networks. arXiv preprint arXiv:1604.07269.
  22. Metawa, N., Hassan, M. K., & Elhoseny, M. (2017). Genetic algorithm based model for optimizing bank lending decisions. Expert Systems with Applications,80, 75–82.CrossRefGoogle Scholar
  23. Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. Paper presented at the Proceedings of the 27th international conference on machine learning (ICML-10).Google Scholar
  24. Passricha, V., & Aggarwal, R. K. (2018). Convolutional support vector machines for speech recognition. International Journal of Speech Technology,22(3), 601–609.CrossRefGoogle Scholar
  25. Passricha, V., & Aggarwal, R. K. (2019). A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems.  https://doi.org/10.1515/jisys-2018-0372.CrossRefGoogle Scholar
  26. Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y. L., Tan, J., et al. (2017). Large-scale evolution of image classifiers. Paper presented at the Proceedings of the 34th International Conference on Machine Learning, Vol. 70.Google Scholar
  27. Risi, S., & Stanley, K. O. (2012). An enhanced hypercube-based encoding for evolving the placement, density, and connectivity of neurons. Artificial Life,18(4), 331–363.CrossRefGoogle Scholar
  28. Sainath, T. N., Kingsbury, B., Mohamed, A.-r., Dahl, G. E., Saon, G., Soltau, H., et al. (2013). Improvements to Deep Convolutional Neural Networks for LVCSR. Paper presented at the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).Google Scholar
  29. Sainath, T. N., Mohamed, A.-r., Kingsbury, B., & Ramabhadran, B. (2013). Deep convolutional neural networks for LVCSR. Paper presented at the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google Scholar
  30. Sainath, T. N., Kingsbury, B., Saon, G., Soltau, H., Mohamed, A. R., Dahl, G., et al. (2015). Deep convolutional neural networks for large-scale speech tasks. Neural Networks,64, 39–48.CrossRefGoogle Scholar
  31. Samudravijaya, K., Rao, P., & Agrawal, S. (2000). Hindi speech database. Paper presented at the Sixth International Conference on Spoken Language Processing.Google Scholar
  32. Schaffer, J. D., Whitley, D., & Eshelman, L. J. (1992). Combinations of genetic algorithms and neural networks: A survey of the state of the art. Paper presented at the [Proceedings] COGANN-92: International Workshop on Combinations of Genetic Algorithms and Neural Networks.Google Scholar
  33. Senior, A., Heigold, G., Bacchiani, M., & Liao, H. (2014). GMM-free DNN acoustic model training. Paper presented at the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Google Scholar
  34. Singhal, S., Passricha, V., Sharma, P., & Aggarwal, R. K. (2018). Multi-level region-of-interest CNNs for end to end speech recognition. Journal of Ambient Intelligence and Humanized Computing.  https://doi.org/10.1007/s12652-018-1146-z.CrossRefGoogle Scholar
  35. Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. Paper presented at the Advances in Neural Information Processing Systems (NIPS).Google Scholar
  36. Stanley, K. O. (2007). Compositional pattern producing networks: A novel abstraction of development. Genetic Programming and Evolvable Machines,8(2), 131–162.CrossRefGoogle Scholar
  37. Stanley, K. O., D’Ambrosio, D. B., & Gauci, J. (2009). A hypercube-based encoding for evolving large-scale neural networks. Artificial life,15(2), 185–212.CrossRefGoogle Scholar
  38. Suganuma, M., Shirakawa, S., & Nagao, T. (2017). A genetic programming approach to designing convolutional neural network architectures. Paper presented at the Proceedings of the Genetic and Evolutionary Computation Conference.Google Scholar
  39. Viikki, O., & Laurila, K. (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication,25(1–3), 133–147.CrossRefGoogle Scholar
  40. Zoph, B., & Le, Q. V. (2016). Neural architecture search with reinforcement learning. Retrieved September 29, 2019, from arXiv preprint arXiv:1611.01578.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.National Institute of TechnologyKurukshetraIndia

Personalised recommendations