Group Pruning Using a Bounded-\(\ell _p\) Norm for Group Gating and Regularization

  • Chaithanya Kumar MummadiEmail author
  • Tim Genewein
  • Dan Zhang
  • Thomas Brox
  • Volker Fischer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11824)


Deep neural networks achieve state-of-the-art results on several tasks while increasing in complexity. It has been shown that neural networks can be pruned during training by imposing sparsity inducing regularizers. In this paper, we investigate two techniques for group-wise pruning during training in order to improve network efficiency. We propose a gating factor after every convolutional layer to induce channel level sparsity, encouraging insignificant channels to become exactly zero. Further, we introduce and analyse a bounded variant of the \(\ell _1\) regularizer, which interpolates between \(\ell _1\) and \(\ell _0\)-norms to retain performance of the network at higher pruning rates. To underline effectiveness of the proposed methods, we show that the number of parameters of ResNet-164, DenseNet-40 and MobileNetV2 can be reduced down by \(30\%\), \(69\%\), and \(75\%\) on CIFAR100 respectively without a significant drop in accuracy. We achieve state-of-the-art pruning results for ResNet-50 with higher accuracy on ImageNet. Furthermore, we show that the light weight MobileNetV2 can further be compressed on ImageNet without a significant drop in performance .


  1. 1.
    Achterhold, J., Koehler, J.M., Schmeink, A., Genewein, T.: Variational network quantization. In: ICLR2018 (2018)Google Scholar
  2. 2.
    Alvarez, J.M., Salzmann, M.: Learning the number of neurons in deep networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2270–2278 (2016)Google Scholar
  3. 3.
    Chen, W., Wilson, J., Tyree, S., Weinberger, K., Chen, Y.: Compressing neural networks with the hashing trick. In: International Conference on Machine Learning (ICML), pp. 2285–2294 (2015)Google Scholar
  4. 4.
    Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv:1710.09282 (2017)
  5. 5.
    Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: Training deep neural networks with weights and activations constrained to + 1 or \(-\)1. arXiv:1602.02830 (2016)
  6. 6.
    Federici, M., Ullrich, K., Welling, M.: Improved Bayesian compression. arXiv:1711.06494 (2017)
  7. 7.
    Frankle, J., Carbin, M.: The lottery ticket hypothesis: Finding small, trainable neural networks. arXiv:1803.03635 (2018)
  8. 8.
    Ghosh, S., Yao, J., Doshi-Velez, F.: Structured variational learning of Bayesian neural networks with horseshoe priors. arXiv:1806.05975 (2018)
  9. 9.
    Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv:1412.6115 (2014)
  10. 10.
    Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient DNNs. In: Advances In Neural Information Processing Systems (NIPS), pp. 1379–1387 (2016)Google Scholar
  11. 11.
    Gysel, P., Pimentel, J., Motamedi, M., Ghiasi, S.: Ristretto: a framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 29(11), 5784–5789 (2018)CrossRefGoogle Scholar
  12. 12.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (ICLR) (2016)Google Scholar
  13. 13.
    Han, S., et al.: DSD: regularizing deep neural networks with dense-sparse-dense training flow. In: International Conference on Learning Representations (ICLR) (2017)Google Scholar
  14. 14.
    Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (NIPS), pp. 1135–1143 (2015)Google Scholar
  15. 15.
    Hanson, S.J., Pratt, L.Y.: Comparing biases for minimal network construction with back-propagation. In: Advances in Neural Information Processing Systems (NIPS), pp. 177–185 (1989)Google Scholar
  16. 16.
    He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: International Conference on Computer Vision (ICCV), vol. 2 (2017)Google Scholar
  17. 17.
    Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)
  18. 18.
    Hu, H., Peng, R., Tai, Y.W., Tang, C.K.: Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv:1607.03250 (2016)
  19. 19.
    Huang, Z., Wang, N.: Data-driven sparse structure selection for deep neural networks. arXiv:1707.01213 (2017)
  20. 20.
    Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. (JMLR) 18(1), 6869–6898 (2017)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)0.5 mb model size. arXiv:1602.07360 (2016)
  22. 22.
    Karaletsos, T., Rätsch, G.: Automatic relevance determination for deep generative models. arXiv:1505.07765 (2015)
  23. 23.
    Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. In: International Conference on Learning Representations (ICLR) (2017)Google Scholar
  24. 24.
    Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2755–2763. IEEE (2017)Google Scholar
  25. 25.
    Louizos, C., Ullrich, K., Welling, M.: Bayesian compression for deep learning. In: Advances in Neural Information Processing Systems (2017)Google Scholar
  26. 26.
    Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through \(L_0\) regularization. In: ICLR 2018 (2018)Google Scholar
  27. 27.
    Luo, J.H., Wu, J., Lin, W.: Thinet: a filter level pruning method for deep neural network compression. In: ICCV 2017 (2017)Google Scholar
  28. 28.
    MacKay, D.J.: Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks. Netw. Comput. Neural Syst. 6(3), 469–505 (1995)CrossRefGoogle Scholar
  29. 29.
    Molchanov, D., Ashukha, A., Vetrov, D.: Variational dropout sparsifies deepneural networks. In: ICML 2017 (2017)Google Scholar
  30. 30.
    Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. In: ICLR2017 (2017)Google Scholar
  31. 31.
    Neal, R.M.: Bayesian Learning for Neural Networks. Ph.D. thesis, University of Toronto (1995)Google Scholar
  32. 32.
    Neklyudov, K., Molchanov, D., Ashukha, A., Vetrov, D.: Structured Bayesian pruning via log-normal multiplicative noise. arXiv:1705.07283 (2017)
  33. 33.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). Scholar
  34. 34.
    Sze, V., Chen, Y.H., Yang, T.J., Emer, J.: Efficient processing of deep neural networks: A tutorial and survey. arXiv:1703.09039 (2017)
  35. 35.
    Ullrich, K., Meeds, E., Welling, M.: Soft weight-sharing for neural network compression. In: ICLR 2017 (2017)Google Scholar
  36. 36.
    Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems, pp. 2074–2082 (2016)Google Scholar
  37. 37.
    Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero-norm with linear models and kernel methods. J. Mach. Learn. Res. (JMLR) 3, 1439–1461 (2003)MathSciNetzbMATHGoogle Scholar
  38. 38.
    Wu, S., Li, G., Chen, F., Shi, L.: Training and inference with integers in deep neural networks. arXiv:1802.04680 (2018)
  39. 39.
    Ye, J., Lu, X., Lin, Z., Wang, J.Z.: Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv:1802.00124 (2018)
  40. 40.
    Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: Towards lossless CNNs with low-precision weights. arXiv:1702.03044 (2017)
  41. 41.
    Zhou, H., Alvarez, J.M., Porikli, F.: Less is more: towards compact CNNs. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 662–677. Springer, Cham (2016). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Bosch Center for Artificial IntelligenceRobert Bosch GmbHRenningenGermany
  2. 2.University of FreiburgFreiburg im BreisgauGermany

Personalised recommendations