Compressing Deep Neural Network

  • Shiming HeEmail author
  • Zhuozhou Li
  • Jin Wang
  • Kun Xie
  • Dafang Zhang
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 536)


Deep learning is the most useful tool for may applications, such as image recognize, nature language processing. But huge computation power and millions of parameters are needed in large models which may can’t be supported and stored. For this problem, some works tried to compress the dense weight matrices with sparse representations technologies, such as matrix decomposition and tensor decomposition. But it is still unknown which is the largest compress ratio. Therefore, in this paper, we analyse the relationship between the shape of tensor and the number of parameters, formulate the problem of minimizing the number of parameters, and solve it to find the best compress ratio. We compare the compressed ration on three data sets.


Deep neural network Parameters compressing Matrix decomposition Tensor decomposition 



This work was supported by National Natural Science Foundation of China (Nos. 61802030, 61572184, 61502054), the Science and Technology Projects of Hunan Province (No. 2016JC2075), the Research Foundation of Education Bureau of Hunan Province, China (Nos. 16C0047, 16B085).


  1. 1.
    Denil, B., Shakibi, L., Dinh, N., de Freitas et al., Predicting parameters in deep learning. In: Advances in Neural Information Processing Systems, pp. 2148–2156. IEEE Press, New York (2013)Google Scholar
  2. 2.
    Chien, J.T., Bao, Y.T.: Tensor-factorized neural networks. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1998–2011 (2018)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Tjandra, A., Sakti, S., Nakamura, S., Compressing recurrent neural network with tensor train. In: 2017 International Joint Conference on in Neural Networks (IJCNN), pp. 4451–4458. IEEE Press, New York (2017)Google Scholar
  4. 4.
    Lathauwer, L.D., Moor, B.D., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl 21(4), 1253–1278 (2000)MathSciNetCrossRefGoogle Scholar
  5. 5.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  6. 6.
    Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: Proceedings of NIPS Workshop Deep Learning and Unsupervised Feature Learning, p. 5. IEEE Press. New York (2011)Google Scholar
  7. 7.
    Liao, C.P., Chien J.T.: Graphical modeling of conditional random fields for human motion recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1969–1972. IEEE Press, New York (2008)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Shiming He
    • 1
    Email author
  • Zhuozhou Li
    • 1
  • Jin Wang
    • 1
  • Kun Xie
    • 2
  • Dafang Zhang
    • 2
  1. 1.School of Computer and Communication EngineeringChangsha University of Science and TechnologyChangshaChina
  2. 2.College of Computer Science and Electronics EngineeringHunan UniversityChangshaChina

Personalised recommendations