Skip to main content

Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition

  • Conference paper
  • First Online:
  • 2343 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Abstract

Batch Normalization has showed success in image classification and other image processing areas by reducing internal covariate shift in deep network model’s training procedure. In this paper, we propose to apply batch normalization to speech recognition within the hybrid NN-HMM model. We evaluate the performance of this new method in the acoustic model of the hybrid system with a speaker-independent speech recognition task using some Chinese datasets. Compared to the former best model we used in the Chinese datasets, it shows that with batch normalization we can reach lower word error rate (WER) of 8 %–13 % relatively, meanwhile we just need 60 % iterations of original model to finish the training procedure.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 512, 436–444 (2015)

    Article  Google Scholar 

  2. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  3. Abdel-Hamid, O., Mohamed, A., Jiang, H., et al.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4277–4280. IEEE (2012)

    Google Scholar 

  4. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization, 12, 2121–2159. JMLR.org (2011)

    Google Scholar 

  5. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 1139–1147 (2013)

    Google Scholar 

  6. Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function, 90, 227–244. Elsevier (2000)

    Google Scholar 

  7. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep networktraining by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167

  8. Wiesler, S., Ney, H.: A convergence analysis of log-linear training. In: Advances in Neural Information Processing Systems, pp. 657–665 (2011)

    Google Scholar 

  9. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, no. EPFL-CONF-192584. IEEE Signal Processing Society (2011)

    Google Scholar 

  10. Sainath, T.N., Mohamed, A.R., Kingsbury, B., Ramabhadran, B.: Deep convolutional neural networks for LVCSR, pp. 8614–8618 (2013)

    Google Scholar 

Download references

Acknowledgment

This work is jointly supported by the Science and Technology Commission of Shanghai Municipality under research grants 14511105500 and 14DZ2260800.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongjian Zhan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Zhan, H., Chen, G., Lu, Y. (2016). Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_35

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3005-5_35

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3004-8

  • Online ISBN: 978-981-10-3005-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics