Advertisement

DHP: Differentiable Meta Pruning via HyperNetworks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)

Abstract

Network pruning has been the driving force for the acceleration of neural networks and the alleviation of model storage/transmission burden. With the advent of AutoML and neural architecture search (NAS), pruning has become topical with automatic mechanism and searching based architecture optimization. Yet, current automatic designs rely on either reinforcement learning or evolutionary algorithm. Due to the non-differentiability of those algorithms, the pruning algorithm needs a long searching stage before reaching the convergence.

To circumvent this problem, this paper introduces a differentiable pruning method via hypernetworks for automatic network pruning. The specifically designed hypernetworks take latent vectors as input and generate the weight parameters of the backbone network. The latent vectors control the output channels of the convolutional layers in the backbone network and act as a handle for the pruning of the layers. By enforcing \(\ell _1\) sparsity regularization to the latent vectors and utilizing proximal gradient solver, sparse latent vectors can be obtained. Passing the sparsified latent vectors through the hypernetworks, the corresponding slices of the generated weight parameters can be removed, achieving the effect of network pruning. The latent vectors of all the layers are pruned together, resulting in an automatic layer configuration. Extensive experiments are conducted on various networks for image classification, single image super-resolution, and denoising. And the experimental results validate the proposed method. Code is available at https://github.com/ofsoundof/dhp.

Keywords

Network pruning Hyperneworks Meta learning Differentiable optimization Proximal gradient 

Notes

Acknowledgements

This work was partly supported by the ETH Zürich Fund (OK), a Huawei Technologies Oy (Finland) project, an Amazon AWS grant, and an Nvidia grant.

Supplementary material

504445_1_En_36_MOESM1_ESM.pdf (236 kb)
Supplementary material 1 (pdf 235 KB)

References

  1. 1.
    MIT technology review: 10 breakthrough technologies 2020. https://www.technologyreview.com/lists/technologies/2020/#tiny-ai. Accessed 01 Mar 2020
  2. 2.
    Alvarez, J.M., Salzmann, M.: Learning the number of neurons in deep networks. In: Proceedings of NeurIPS, pp. 2270–2278 (2016)Google Scholar
  3. 3.
    Alvarez, J.M., Salzmann, M.: Compression-aware training of deep networks. In: Proceedings of NeurIPS, pp. 856–867 (2017)Google Scholar
  4. 4.
    Brock, A., Lim, T., Ritchie, J., Weston, N.: SMASH: one-shot model architecture search through hypernetworks. In: Proceedings of ICLR (2018)Google Scholar
  5. 5.
    Chang, O., Flokas, L., Lipson, H.: Principled weight initialization for hypernetworks. In: Proceedings of ICLR (2020)Google Scholar
  6. 6.
    Chen, C., Tung, F., Vedula, N., Mori, G.: Constraint-aware deep neural network compression. In: Proceedings of ECCV, pp. 400–415 (2018)Google Scholar
  7. 7.
    Chin, T.W., Ding, R., Zhang, C., Marculescu, D.: Towards efficient model compression via learned global ranking. In: Proceedings of CVPR, pp. 1518–1528 (2020)Google Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of CVPR, pp. 248–255. IEEE (2009)Google Scholar
  9. 9.
    Dong, X., Chen, S., Pan, S.: Learning to prune deep neural networks via layer-wise optimal brain surgeon. In: Proceedings of NIPS, pp. 4857–4867 (2017)Google Scholar
  10. 10.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of AISTATS, pp. 249–256 (2010)Google Scholar
  11. 11.
    Ha, D., Dai, A., Le, Q.V.: HyperNetworks. In: Proceedings of ICLR (2017)Google Scholar
  12. 12.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: Proceedings of ICLR (2015)Google Scholar
  13. 13.
    Hassibi, B., Stork, D.G.: Second order derivatives for network pruning: optimal brain surgeon. In: Proceedings of NeurIPS, pp. 164–171 (1993)Google Scholar
  14. 14.
    Hayashi, K., Yamaguchi, T., Sugawara, Y., Maeda, S.i.: Einconv: exploring unexplored tensor decompositions for convolutional neural networks. In: Proceedings of NeurIPS, pp. 5553–5563 (2019)Google Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)Google Scholar
  16. 16.
    He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of CVPR, pp. 4340–4349 (2019)Google Scholar
  17. 17.
    He, Y., Lin, J., Liu, Z., Wang, H., Li, L.J., Han, S.: AMC: AutoML for model compression and acceleration on mobile devices. In: Proceedings of ECCV, pp. 784–800 (2018)Google Scholar
  18. 18.
    He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: Proceedings of ICCV, pp. 1389–1397 (2017)Google Scholar
  19. 19.
    Howard, A.G., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  20. 20.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of CVPR, pp. 2261–2269 (2017)Google Scholar
  21. 21.
    Huang, Z., Wang, N.: Data-driven sparse structure selection for deep neural networks. In: Proceedings of ECCV, pp. 304–320 (2018)Google Scholar
  22. 22.
    Kim, H., Umar Karim Khan, M., Kyung, C.M.: Efficient neural network compression. In: Proceedings of CVP, June 2019Google Scholar
  23. 23.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)Google Scholar
  24. 24.
    Krueger, D., Huang, C.W., Islam, R., Turner, R., Lacoste, A., Courville, A.: Bayesian hypernetworks. arXiv preprint arXiv:1710.04759 (2017)
  25. 25.
    LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Proceedings of NeurIPS, pp. 598–605 (1990)Google Scholar
  26. 26.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of CVPR, pp. 105–114 (2017)Google Scholar
  27. 27.
    Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. In: Proceedings of ICLR (2017)Google Scholar
  28. 28.
    Li, J., Qi, Q., Wang, J., Ge, C., Li, Y., Yue, Z., Sun, H.: OICSR: out-in-channel sparsity regularization for compact deep neural networks. In: Proceedings of CVPR, pp. 7046–7055 (2019)Google Scholar
  29. 29.
    Li, Y., Gu, S., Mayer, C., Van Gool, L., Timofte, R.: Group sparsity: the hinge between filter pruning and decomposition for network compression. In: Proceedings of CVPR (2020)Google Scholar
  30. 30.
    Li, Y., Gu, S., Van Gool, L., Timofte, R.: Learning filter basis for convolutional neural network compression. In: Proceedings of ICCV, pp. 5623–5632 (2019)Google Scholar
  31. 31.
    Li, Y., et al.: Exploiting kernel sparsity and entropy for interpretable CNN compression. In: Proceedings of CVPR (2019)Google Scholar
  32. 32.
    Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of CVPRW, pp. 1132–1140 (2017)Google Scholar
  33. 33.
    Lin, S., et al.: Towards optimal structured CNN pruning via generative adversarial learning. In: Proceedings of CVPR, pp. 2790–2799 (2019)Google Scholar
  34. 34.
    Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: Proceedings of CVPR, pp. 806–814 (2015)Google Scholar
  35. 35.
    Liu, C., et al.: Progressive neural architecture search. In: Proceedings of ECCV, September 2018Google Scholar
  36. 36.
    Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: Proceedings of ICLR (2019)Google Scholar
  37. 37.
    Liu, Z., et a: MetaPruning: meta learning for automatic neural network channel pruning. In: Proceedings of ICCV (2019)Google Scholar
  38. 38.
    Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. In: Proceedings of ICLR (2019)Google Scholar
  39. 39.
    Minnehan, B., Savakis, A.: Cascaded projection: end-to-end network compression and acceleration. In: Proceedings of CVPR, June 2019Google Scholar
  40. 40.
    Molchanov, P., Mallya, A., Tyree, S., Frosio, I., Kautz, J.: Importance estimation for neural network pruning. In: Proceedings of CVPR, pp. 11264–11272 (2019)Google Scholar
  41. 41.
    Pan, Z., Liang, Y., Zhang, J., Yi, X., Yu, Y., Zheng, Y.: Hyperst-net: hypernetworks for spatio-temporal forecasting. arXiv preprint arXiv:1809.10889 (2018)
  42. 42.
    Paszke, A., et al.: Automatic differentiation in PyTorch (2017)Google Scholar
  43. 43.
    Peng, B., Tan, W., Li, Z., Zhang, S., Xie, D., Pu, S.: Extreme network compression via filter group approximation. In: Proceedings of ECCV, pp. 300–316 (2018)Google Scholar
  44. 44.
    Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: Proceedings of ICML, pp. 4095–4104 (2018)Google Scholar
  45. 45.
    Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: Proceedings of AAAI, vol. 33, pp. 4780–4789 (2019)Google Scholar
  46. 46.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  47. 47.
    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: Inverted residuals and linear bottlenecks. In: Proceedings of CVPR, pp. 4510–4520 (2018)Google Scholar
  48. 48.
    Son, S., Nah, S., Mu Lee, K.: Clustering convolutional kernels to compress deep neural networks. In: Proceedings of ECCV, pp. 216–232 (2018)Google Scholar
  49. 49.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)zbMATHGoogle Scholar
  50. 50.
    Torfi, A., Shirvani, R.A., Soleymani, S., Nasrabadi, N.M.: GASL: Guided attention for sparsity learning in deep neural networks. arXiv preprint arXiv:1901.01939 (2019)
  51. 51.
    Wang, M., Liu, B., Foroosh, H.: Factorized convolutional neural networks. In: Proceedings of ICCVW, pp. 545–553 (2017)Google Scholar
  52. 52.
    Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Proceedings of NeurIPS, pp. 2074–2082 (2016)Google Scholar
  53. 53.
    Ye, J., Lu, X., Lin, Z., Wang, J.Z.: Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In: Proceedings of ICLR (2018)Google Scholar
  54. 54.
    Yoon, J., Hwang, S.J.: Combined group and exclusive sparsity for deep neural networks. In: Proceedings of ICML, pp. 3958–3966. JMLR. org (2017)Google Scholar
  55. 55.
    Yu, R., et al.: NISP: pruning networks using neuron importance score propagation. In: Proceedings of CVPR, pp. 9194–9203 (2018)Google Scholar
  56. 56.
    Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE TIP 26(7), 3142–3155 (2017)MathSciNetzbMATHGoogle Scholar
  57. 57.
    Zhao, C., Ni, B., Zhang, J., Zhao, Q., Zhang, W., Tian, Q.: Variational convolutional neural network pruning. In: Proceedings of CVPR, pp. 2780–2789 (2019)Google Scholar
  58. 58.
    Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: Proceedings of ICLR (2017)Google Scholar
  59. 59.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Proceedings of CVPR, June 2018Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Computer Vision LabETH ZürichZürichSwitzerland
  2. 2.The University of SydneySydneyAustralia
  3. 3.KU LeuvenLeuvenBelgium

Personalised recommendations