Dynamic and Adaptive Threshold for DNN Compression from Scratch

  • Chunhui JiangEmail author
  • Guiying Li
  • Chao Qian
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10593)


Despite their great success, deep neural networks (DNN) are hard to deploy on devices with limited hardware like mobile phones because of massive parameters. Many methods have been proposed for DNN compression, i.e., to reduce the parameters of DNN models. However, almost all of them are based on reference models, which were firstly trained. In this paper, we propose an approach to perform DNN training and compression simultaneously. More concretely, a dynamic and adaptive threshold (DAT) framework is utilized to prune a DNN gradually by changing the pruning threshold during training. Experiments show that DAT can not only reach comparable or better compression rate almost without loss of accuracy than state-of-the-art DNN compression methods, but also beat DNN sparse training methods by a large margin.


Deep neural networks Pruning DNN compression 



We want to thank the reviewers for their valuable comments. This work was supported by the NSFC (U1605251, U1613216), the Young Elite Scientists Sponsorship Program by CAST (2016QNRC001), the CCF-Tencent Open Research Fund and the Royal Society Grant on “Data Driven Metaheuristic Search”.


  1. 1.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  2. 2.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  3. 3.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  4. 4.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)zbMATHGoogle Scholar
  5. 5.
    Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)Google Scholar
  6. 6.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  7. 7.
    Denil, M., Shakibi, B., Dinh, L., deFreitas, N., et al.: Predicting parameters in deep learning. In: Advances in Neural Information Processing Systems, pp. 2148–2156 (2013)Google Scholar
  8. 8.
    Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient DNNs. In: Advances in Neural Information Processing Systems, pp. 1379–1387 (2016)Google Scholar
  9. 9.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS, vol. 9, pp. 249–256 (2010)Google Scholar
  10. 10.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)Google Scholar
  11. 11.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  12. 12.
    Collins, M.D., Kohli, P.: Memory bounded deep convolutional networks. arXiv preprint arXiv:1412.1442 (2014)
  13. 13.
    Lin, D.D., Talathi, S.S., Annapureddy, V.S.: Fixed point quantization of deep convolutional networks. arXiv (2015)Google Scholar
  14. 14.
    Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp. 3123–3131 (2015)Google Scholar
  15. 15.
    Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014)
  16. 16.
    Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., Chen, Y.: Compressing neural networks with the hashing trick. In: ICML, pp. 2285–2294 (2015)Google Scholar
  17. 17.
    Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information Processing Systems, pp. 1269–1277 (2014)Google Scholar
  18. 18.
    Zhang, X., Zou, J., Ming, X., He, K., Sun, J.: Efficient and accurate approximations of nonlinear convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1984–1992 (2015)Google Scholar
  19. 19.
    Lin, S., Ji, R., Guo, X., Li, X., et al.: Towards convolutional neural networks compression via global error reconstruction. In: International Joint Conferences on Artificial Intelligence (2016)Google Scholar
  20. 20.
    Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., Dally, W.J.: EIE: efficient inference engine on compressed deep neural network. In: Proceedings of the 43rd International Symposium on Computer Architecture, pp. 243–254. IEEE Press (2016)Google Scholar
  21. 21.
    LeCun, Y., Denker, J.S., Solla, S.A., Howard, R.E., Jackel, L.D.: Optimal brain damage. In: NIPs, vol. 2, pp. 598–605 (1989)Google Scholar
  22. 22.
    Hassibi, B., Stork, D.G., et al.: Second order derivatives for network pruning: optimal brain surgeon. In: Advances in Neural Information Processing Systems, p. 164 (1993)Google Scholar
  23. 23.
    Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems, pp. 2074–2082 (2016)Google Scholar
  24. 24.
    Zaabab, A.H., Zhang, Q.J., Nakhla, M.S.: Device and circuit-level modeling using neural networks with faster training based on network sparsity. IEEE Trans. Microw. Theor. Tech. 45(10), 1696–1704 (1997)CrossRefGoogle Scholar
  25. 25.
    Ishikawa, M.: Structural learning with forgetting. Neural Networks 9(3), 509–521 (1996)CrossRefGoogle Scholar
  26. 26.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Computer Science and Technology, USTC-Birmingham Joint Research Institute in Intelligent Computation and Its Applications (UBRI)University of Science and Technology of ChinaHefeiPeople’s Republic of China

Personalised recommendations