Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition

Zhan, Hongjian; Chen, Guilin; Lu, Yue

doi:10.1007/978-981-10-3005-5_35

Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition

Hongjian Zhan¹⁶,
Guilin Chen¹⁷ &
Yue Lu¹⁶

Conference paper
First Online: 22 October 2016

2343 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Abstract

Batch Normalization has showed success in image classification and other image processing areas by reducing internal covariate shift in deep network model’s training procedure. In this paper, we propose to apply batch normalization to speech recognition within the hybrid NN-HMM model. We evaluate the performance of this new method in the acoustic model of the hybrid system with a speaker-independent speech recognition task using some Chinese datasets. Compared to the former best model we used in the Chinese datasets, it shows that with batch normalization we can reach lower word error rate (WER) of 8 %–13 % relatively, meanwhile we just need 60 % iterations of original model to finish the training procedure.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 512, 436–444 (2015)
Article Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
Article MathSciNet MATH Google Scholar
Abdel-Hamid, O., Mohamed, A., Jiang, H., et al.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4277–4280. IEEE (2012)
Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization, 12, 2121–2159. JMLR.org (2011)
Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 1139–1147 (2013)
Google Scholar
Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function, 90, 227–244. Elsevier (2000)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep networktraining by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167
Wiesler, S., Ney, H.: A convergence analysis of log-linear training. In: Advances in Neural Information Processing Systems, pp. 657–665 (2011)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, no. EPFL-CONF-192584. IEEE Signal Processing Society (2011)
Google Scholar
Sainath, T.N., Mohamed, A.R., Kingsbury, B., Ramabhadran, B.: Deep convolutional neural networks for LVCSR, pp. 8614–8618 (2013)
Google Scholar

Download references

Acknowledgment

This work is jointly supported by the Science and Technology Commission of Shanghai Municipality under research grants 14511105500 and 14DZ2260800.

Author information

Authors and Affiliations

Shanghai Key Laboratory of Multidimensional Information Processing, Department of Computer Science and Technology, East China Normal University, Shanghai, 200241, China
Hongjian Zhan & Yue Lu
Shanghai Youngtone Technology Co., Ltd, Shanghai, China
Guilin Chen

Authors

Hongjian Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Guilin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yue Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongjian Zhan .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an, China
Xuelong Li
Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China
Xilin Chen
Tsinghua University , Beijing, China
Jie Zhou
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
University of Electronic Science and Technology, Chengdu, Sichuan, China
Hong Cheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhan, H., Chen, G., Lu, Y. (2016). Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_35

Download citation

DOI: https://doi.org/10.1007/978-981-10-3005-5_35
Published: 22 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics