Advertisement

Acoustic Model Compression with Knowledge Transfer

  • Jiangyan YiEmail author
  • Jianhua Tao
  • Zhengqi Wen
  • Ya Li
  • Hao Ni
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 807)

Abstract

Mobile devices have limited computing power and limited memory. Thus, large deep neural network (DNN) based acoustic models are not well suited for application on mobile devices. In order to alleviate this problem, this paper proposes to compress acoustic models by using knowledge transfer. This approach forces a large teacher model to transfer generalized knowledge to a small student model. The student model is trained with a linear interpolation of hard probabilities and soft probabilities to learn generalized knowledge from the teacher model. The hard probabilities are generated from a Gaussian mixture model hidden Markov model (GMM-HMM) system. The soft probabilities are computed from a teacher model (DNN or RNN). Experiments on AMI corpus show that a small student model obtains 2.4% relative WER improvement over a large teacher model with almost 7.6 times compression ratio.

Keywords

Model compression Knowledge transfer Deep neural networks Automatic speech recognition 

Notes

Acknowledgements

This work is supported by the National High-Tech Research and Development Program of China (863 Program) (No. 2015AA016305), the National Natural Science Foundation of China (NSFC) (No. 61425017, No. 61403386).

References

  1. 1.
    Dahl, G.E., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRefGoogle Scholar
  2. 2.
    Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., et al.: Recent advances in deep learning for speech research at microsoft. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 8604–8608. IEEE (2013)Google Scholar
  3. 3.
    Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 6645–6649. IEEE (2013)Google Scholar
  4. 4.
    Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: Automatic Speech Recognition and Understanding, Olomouc, pp. 273–278. IEEE (2013)Google Scholar
  5. 5.
    Weng, C., Yu, D., Watanabe, S., Juang, B.H.F.: Recurrent deep neural networks for robust speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, pp. 5532–5536. IEEE (2014)Google Scholar
  6. 6.
    Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Comput. Sci. 20(1), 338–342 (2014)Google Scholar
  7. 7.
    Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 6655–6659. IEEE (2013)Google Scholar
  8. 8.
    Lu, Z., Sindhwani, V., Sainath, T.N.: Learning compact recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5960–5964. IEEE (2016)Google Scholar
  9. 9.
    Xue, J., Li, J., Gong, Y.: Restructuring of deep neural network acoustic models with singular value decomposition. In: 14th Annual Conference of the International Speech Communication Association, Lyon, pp. 662–665. ISCA (2013)Google Scholar
  10. 10.
    Prabhavalkar, R., Alsharif, O., Bruguier, A., Mcgraw, I.: On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5970–5974. IEEE (2016)Google Scholar
  11. 11.
    Vanhoucke, V., Devin, M., Heigold, G.: Multiframe deep neural networks for acoustic modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 6645–6649. IEEE (2013)Google Scholar
  12. 12.
    Lei, X., Senior, A., Gruenstein, A., Sorensen, J.: Accurate and compact large vocabulary speech recognition on mobile devices. In: 14th Annual Conference of the International Speech Communication Association, Lyon, pp. 2365–2369. ISCA (2013)Google Scholar
  13. 13.
    Wang, Y., Li, J., Gong, Y.: Small-footprint high-performance deep neural network-based speech recognition using split-VQ. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, pp. 4984–4988. IEEE (2015)Google Scholar
  14. 14.
    Mcgraw, I., Prabhavalkar, R., Alvarez, R., Arenas, M.G., Rao, K., Rybach, D., et al.: Personalized speech recognition on mobile devices. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5955–5959. IEEE (2016)Google Scholar
  15. 15.
    Li, J., Zhao, R., Huang, J.T., Gong, Y.: Learning small-size DNN with output-distribution-based criteria. In: 15th Annual Conference of the International Speech Communication Association, Singapore, pp. 1910–1914. ISCA (2014)Google Scholar
  16. 16.
    Chan, W., Ke, N.R., Lane, I.: Transferring knowledge from a RNN to a DNN. In: 15th Annual Conference of the International Speech Communication Association, Dresden, pp. 3264– 3268. ISCA (2015)Google Scholar
  17. 17.
    Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, pp. 535–541. ACM (2006)Google Scholar
  18. 18.
    Ba, L.J., Caruana, R.: Do deep nets really need to be deep? Adv. Neural. Inf. Process. Syst. 12(1), 2654–2662 (2014)Google Scholar
  19. 19.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. Comput. Sci. 14(7), 35–39 (2015)Google Scholar
  20. 20.
    Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., Chen, Y.: Compressing neural networks with the hashing trick. Comput. Sci. 20(2), 2285–2294 (2015)Google Scholar
  21. 21.
    Yu, D., Yao, K., Su, H., Li, G., Seide, F.: KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, pp. 7893–7897. IEEE (2013)Google Scholar
  22. 22.
    Huang, Y., Yu, D., Liu, C., Gong, A.Y.: Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation. In: 15th Annual Conference of the International Speech Communication Association, Singapore, pp. 2977–2981. ISCA (2014)Google Scholar
  23. 23.
    Liu, C., Wang, Y., Kumar, K., Gong, Y.: Investigations on speaker adaptation of LSTM RNN models for speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, pp. 5020–5024. IEEE (2016)Google Scholar
  24. 24.
    Chebotar, Y., Waters, A.: Distilling knowledge from ensembles of neural networks for speech recognition. In: 17th Annual Conference of the International Speech Communication Association, San Francisco, pp. 3439–3443. ISCA (2016)Google Scholar
  25. 25.
    Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. Comput. Sci. 10(2), 138–143 (2014)Google Scholar
  26. 26.
    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et al.: The Kaldi speech recognition toolkit. In: Automatic Speech Recognition and Understanding, Hawaï, pp. 4–9. IEEE (2011)Google Scholar
  27. 27.
    RASC863: 863 annotated 4 regional accent speech corpus. http://www.chineseldc.org/doc/CLDC-SPC-2004-003/intro.htm. Accessed 7 Nov 2017
  28. 28.
    Carletta, J.: Announcing the AMI meeting corpus. The ELRA Newsl. 11(1), 3–5 (2012)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Jiangyan Yi
    • 1
    • 2
    Email author
  • Jianhua Tao
    • 1
    • 2
    • 3
  • Zhengqi Wen
    • 1
  • Ya Li
    • 1
  • Hao Ni
    • 1
    • 2
  1. 1.National Laboratory of Pattern RecognitionInstitute of Automation, Chinese Academy of SciencesBeijingChina
  2. 2.School of Artificial IntelligenceUniversity of Chinese Academy of SciencesBeijingChina
  3. 3.CAS Center for Excellence in Brain Science and Intelligence TechnologyInstitute of Automation, Chinese Academy of SciencesBeijingChina

Personalised recommendations