Advertisement

Training Maxout Neural Networks for Speech Recognition Tasks

  • Aleksey Prudnikov
  • Maxim Korenevsky
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9924)

Abstract

The topic of the paper is the training of deep neural networks which use tunable piecewise-linear activation functions called “maxout” for speech recognition tasks. Maxout networks are compared to the conventional fully-connected DNNs in case of training with both cross-entropy and sequence discriminative (sMBR) criteria. Experiments are carried out on the CHiME Challenge 2015 corpus of multi-microphone noisy dictation speech and the Switchboard corpus of conversational telephone speech. The clear advantage of maxout networks over DNNs is demonstrated when using the cross-entropy criterion on both corpora. It is also argued that maxout networks are prone to overfitting during sequence training but in some cases it can be successfully overcome with the use of the KL-divergence based regularization.

Keywords

Dropout Regularization Rectified linear units Maxout Cross-entropy Sequence training CHiME Switchboard KL-divergence 

Notes

Acknowledgments

This work was financially supported by the Ministry of Educa-tion and Science of the Russian Federation, Contract 14.575.21.0033 (ID RFMEFI57514X0033)

References

  1. 1.
    Abdel-Hamid, O.,Mohamed, A.R., Jiang, H., Penn, G.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4277–4280. IEEE (2012)Google Scholar
  2. 2.
    Barker, J., Marxer, R., Vincent, E., Watanabe, S.: The third ‘chime’ speech separation and recognition challenge: dataset, task and baselines. In: Proceedings of 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015), pp. 504–511 (2015)Google Scholar
  3. 3.
    Bengio, Y.: Deep learning architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 153–160. MIT Press, Cambridge (2007)Google Scholar
  5. 5.
    Cai, M., Shi, Y., Liu, J.: Deep maxout neural networks for speech recognition. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 291–296. IEEE (2013)Google Scholar
  6. 6.
    de-la-Calle-Silos, F., Gallardo-Antolín, A., Peláez-Moreno, C.: Deep maxout networks applied to noise-robust speech recognition. IberSPEECH 2014. LNCS, vol. 8854, pp. 109–118. Springer, Heidelberg (2014)Google Scholar
  7. 7.
    Carreira-Perpinan, M.A., Hinton, G.: On contrastive divergence learning. In: AISTATS, vol. 10, pp. 33–40. Citeseer (2005)Google Scholar
  8. 8.
    Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8609–8613. IEEE (2013)Google Scholar
  9. 9.
    Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. arXiv preprint arXiv:1302.4389 (2013)
  10. 10.
    Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)Google Scholar
  11. 11.
    Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764–1772 (2014)Google Scholar
  12. 12.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
  14. 14.
    Miao, Y., Metze, F., Rawat, S.: Deep maxout networks for low-resource speech recognition. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 398–403. IEEE (2013)Google Scholar
  15. 15.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)Google Scholar
  16. 16.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The kaldi speech recognition toolkit. In: Proceedings of 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2011) (2011)Google Scholar
  17. 17.
    Prudnikov, A., Korenevsky, M., Aleinik, S.: Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition. In: Proceedings of 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015), pp. 401–408 (2015)Google Scholar
  18. 18.
    Prudnikov, A., Medennikov, I., Mendelev, V., Korenevsky, M., Khokhlov, Y.: Improving acoustic models for Russian spontaneous speech recognition. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 234–242. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  19. 19.
    Rennie, S.J., Goel, V., Thomas, S.: Annealed dropout training of deep networks. In: 2014 IEEE Spoken Language Technology Workshop (SLT), pp. 159–164. IEEE (2014)Google Scholar
  20. 20.
    Rumelhart, D., Hinton, G., Williams, R.: Learning internal representations by error propagation. Parallel Distrib. Process. 1, 318–362 (1986)Google Scholar
  21. 21.
    Sainath, T., Rao, K., et al.: Acoustic modelling with CD-CTC-SMBR LSTM RNNS. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 604–609. IEEE (2015)Google Scholar
  22. 22.
    Sainath, T.N., Mohamed, A.R., Kingsbury, B., Ramabhadran, B.: Deep convolutional neural networks for LVCSR. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8614–8618. IEEE (2013)Google Scholar
  23. 23.
    Saon, G., Soltau, H., Nahamoo, D., Picheny, M.: Speaker adaptation of neural network acoustic models using i-vectors. In: Proceedings of 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2013), pp. 55–59 (2013)Google Scholar
  24. 24.
    Saon, G., Kuo, H.K.J., Rennie, S., Picheny, M.: The IBM 2015 English conversational telephone speech recognition system. arXiv preprint arXiv:1505.05899 (2015)
  25. 25.
    Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRefGoogle Scholar
  26. 26.
    Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 24–29. IEEE (2011)Google Scholar
  27. 27.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Swietojanski, P., Li, J., Huang, J.T.: Investigation of maxout networks for speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7699–7703. IEEE (2014)Google Scholar
  29. 29.
    Yu, D., Deng, L.: Automatic Speech Recognition. A Deep Learning Approach. Springer, London (2015)Google Scholar
  30. 30.
    Yu, D., Yao, K., Su, H., Li, G., Seide, F.: KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)Google Scholar
  31. 31.
    Zeiler, M.D., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q.V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3517–3521. IEEE (2013)Google Scholar
  32. 32.
    Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 215–219. IEEE (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.ITMO UniversitySaint PetersburgRussia
  2. 2.Speech Technology CenterSaint PetersburgRussia

Personalised recommendations