Advertisement

Effective SVD-Based Deep Network Compression for Automatic Speech Recognition

  • Hao FuEmail author
  • Yue Ming
  • Yibo Jiang
  • Chunxiao Fan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11264)

Abstract

Neural networks improve speech recognition performance significantly, but their large amount of parameters brings high computation and memory cost. To work around this problem, we propose an efficient network compression method based on Singular Value Decomposition (SVD), Simultaneous Iterative SVD Reconstruction via Loss Sensitive Update (SISVD-LU). Firstly, we analyse the matrices’ singular values to learn the sparsity in every single layer and then we apply SVD on the most sparse layer to factorize the weight matrix into two or more matrices with least reconstruction errors. Secondly, we reconstruct the model using our Loss Sensitive Update strategy, which propagates the error across layers. Finally, we utilize Simultaneous Iterative Compression method, which factorizes all layers simultaneously and then iteratively minimize the model size while keeping the accuracy. We evaluate the proposed approach on the two different LVCSR datasets, AISHELL and TIMIT. On AISHELL mandarin dataset, we can obtain 50% compression ratio in single layer while maintaining almost the same accuracy. When introducing update, our simultaneous iterative compression can further boost the compression ratio, finally reduce model size by 43%. Similar experimental results are also obtained on TIMIT. Both results are gained with slight accuracy loss.

Keywords

Speech recognition SVD-based compression Loss sensitive update Simultaneous iteration 

References

  1. 1.
    Bu, H., Du, J., Na, X., Wu, B., Zheng, H.: Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline. arXiv preprint arXiv:1709.05522 (2017)
  2. 2.
    Bucilu, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541. ACM (2006)Google Scholar
  3. 3.
    Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems, pp. 1269–1277 (2014)Google Scholar
  4. 4.
    Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)Google Scholar
  5. 5.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
  6. 6.
    Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. arXiv preprint arXiv:1712.05877 (2017)
  7. 7.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014)
  8. 8.
    Kim, S., Hori, T., Watanabe, S.: Joint CTC-attention based end-to-end speech recognition using multi-task learning. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4835–4839. IEEE (2017)Google Scholar
  9. 9.
    Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: Audio augmentation for speech recognition. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  11. 11.
    LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)Google Scholar
  12. 12.
    Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)
  13. 13.
    Lin, S., Ji, R., Guo, X., Li, X., et al.: Towards convolutional neural networks compression via global error reconstruction. In: IJCAI, pp. 1753–1759 (2016)Google Scholar
  14. 14.
    Maas, A.L., et al.: Building DNN acoustic models for large vocabulary speech recognition. Comput. Speech Lang. 41, 195–213 (2017)CrossRefGoogle Scholar
  15. 15.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_32CrossRefGoogle Scholar
  16. 16.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  17. 17.
    Sindhwani, V., Sainath, T., Kumar, S.: Structured transforms for small-footprint deep learning. In: Advances in Neural Information Processing Systems, pp. 3088–3096 (2015)Google Scholar
  18. 18.
    Xiong, W., et al.: The microsoft 2016 conversational speech recognition system. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5255–5259. IEEE (2017)Google Scholar
  19. 19.
    Xue, J., Li, J., Gong, Y.: Restructuring of deep neural network acoustic models with singular value decomposition. In: Interspeech, pp. 2365–2369 (2013)Google Scholar
  20. 20.
    Yu, D., Seide, F., Li, G., Deng, L.: Exploiting sparseness in deep neural networks for large vocabulary speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4409–4412. IEEE (2012)Google Scholar
  21. 21.
    Zeyer, A., Doetsch, P., Voigtlaender, P., Schlüter, R., Ney, H.: A comprehensive study of deep bidirectional LSTM RNNs for acoustic modeling in speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2462–2466. IEEE (2017)Google Scholar
  22. 22.
    Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849. IEEE (2017)Google Scholar
  23. 23.
    Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: Towards lossless CNNs with low-precision weights. arXiv preprint arXiv:1702.03044 (2017)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Beijing University of Posts and TelecommunicationsBeijingChina
  2. 2.Ningbo Xitang Technologies Inc.NingboChina

Personalised recommendations