Skip to main content

Ensemble Knowledge Guided Sub-network Search and Fine-Tuning for Filter Pruning

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13671))

Included in the following conference series:

Abstract

Conventional NAS-based pruning algorithms aim to find the sub-network with the best validation performance. However, validation performance does not successfully represent test performance, i.e., potential performance. Also, although fine-tuning the pruned network to restore the performance drop is an inevitable process, few studies have handled this issue. This paper provides a novel Ensemble Knowledge Guidance (EKG) to solve both problems at once. First, we experimentally prove that the fluctuation of loss landscape can be an effective metric to evaluate the potential performance. In order to search a sub-network with the smoothest loss landscape at a low cost, we employ EKG as a search reward. EKG utilized for the following search iteration is composed of the ensemble knowledge of interim sub-networks, i.e., the by-products of the sub-network evaluation. Next, we reuse EKG to provide a gentle and informative guidance to the pruned network while fine-tuning the pruned network. Since EKG is implemented as a memory bank in both phases, it requires a negligible cost. For example, in the case of ResNet-50, just 315 GPU hours are required to remove around 45.04% of FLOPS without any performance degradation, which can operate even on a low-spec workstation. the source code is available at here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 16), pp. 265–283 (2016)

    Google Scholar 

  2. Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S.: Once-for-all: Train one network and specialize it for efficient deployment. In: International Conference on Learning Representations (2019)

    Google Scholar 

  3. Chen, T., Zhang, Z., Liu, S., Chang, S., Wang, Z.: Robust overfitting may be mitigated by properly learned smoothening. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=qZzy5urZw9

  4. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations (2020)

    Google Scholar 

  5. Ding, X., et al.: ResRep: lossless CNN pruning via decoupling remembering and forgetting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4510–4520 (2021)

    Google Scholar 

  6. Dong, X., Yang, Y.: Network pruning via transformable architecture search. In: Advances in Neural Information Processing Systems, pp. 760–771 (2019)

    Google Scholar 

  7. Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: International Conference on Learning Representations (2018)

    Google Scholar 

  8. Gao, S., Huang, F., Cai, W., Huang, H.: Network pruning via performance maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9270–9280 (2021)

    Google Scholar 

  9. Gotmare, A., Keskar, N.S., Xiong, C., Socher, R.: A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation. arXiv preprint arXiv:1810.13243 (2018)

  10. Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020)

  11. Guo, S., Wang, Y., Li, Q., Yan, J.: DMCP: differentiable Markov channel pruning for neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1539–1547 (2020)

    Google Scholar 

  12. Guo, Y., Yuan, H., Tan, J., Wang, Z., Yang, S., Liu, J.: GDP: stabilized neural network pruning via gates with differentiable polarization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5239–5250 (2021)

    Google Scholar 

  13. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  15. He, Y., Ding, Y., Liu, P., Zhu, L., Zhang, H., Yang, Y.: Learning filter pruning criteria for deep convolutional neural networks acceleration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  16. He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4340–4349 (2019)

    Google Scholar 

  17. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  18. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical Report Citeseer (2009)

    Google Scholar 

  19. Lee, S., Heo, B., Ha, J.W., Song, B.C.: Filter pruning and re-initialization via latent space clustering. IEEE Access 8, 189587–189597 (2020). https://doi.org/10.1109/ACCESS.2020.3031031

    Article  Google Scholar 

  20. Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 6391–6401 (2018)

    Google Scholar 

  21. Lin, M., et al.: HRank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  22. Lin, M., Ji, R., Zhang, Y., Zhang, B., Wu, Y., Tian, Y.: Channel pruning via automatic structure search. arXiv preprint arXiv:2001.08565 (2020)

  23. Lin, S., et al.: Towards optimal structured CNN pruning via generative adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790–2799 (2019)

    Google Scholar 

  24. Liu, Z., et al.: Metapruning: meta learning for automatic neural network channel pruning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3296–3305 (2019)

    Google Scholar 

  25. Lu, X., Huang, H., Dong, W., Li, X., Shi, G.: Beyond network pruning: a joint search-and-training approach. In: IJCAI, pp. 2583–2590 (2020)

    Google Scholar 

  26. Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5191–5198 (2020)

    Google Scholar 

  27. Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. In: International Conference on Learning Representations (2017)

    Google Scholar 

  28. Ning, X., Zhao, T., Li, W., Lei, P., Wang, Yu., Yang, H.: DSA: more efficient budgeted pruning via differentiable sparsity allocation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 592–607. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_35

    Chapter  Google Scholar 

  29. Ning, X., Zhao, T., Li, W., Lei, P., Wang, Y., Yang, H.: DSA: more efficient budgeted pruning via differentiable sparsity allocation. arXiv preprint arXiv:2004.02164 (2020)

  30. Nocedal, J., Wright, S.: Numerical Optimization. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  31. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  32. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

    Google Scholar 

  33. Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 2488–2498 (2018)

    Google Scholar 

  34. Soper, H., Young, A., Cave, B., Lee, A., Pearson, K.: On the distribution of the correlation coefficient in small samples. appendix ii to the papers of “student” and RA fisher. Biometrika 11(4), 328–413 (1917)

    Google Scholar 

  35. Su, X., et al.: Locally free weight sharing for network width search. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=S0UdquAnr9k

  36. Su, X., You, S., Wang, F., Qian, C., Zhang, C., Xu, C.: BCNet: searching for network width with bilaterally coupled network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2175–2184 (2021)

    Google Scholar 

  37. Tan, X., Ren, Y., He, D., Qin, T., Liu, T.Y.: Multilingual neural machine translation with knowledge distillation. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=S1gUsoR9YX

  38. Tang, J., et al.: Understanding and improving knowledge distillation. arXiv preprint arXiv:2002.03532 (2020)

  39. Tang, Y., et al.: Manifold regularized dynamic network pruning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5018–5028 (2021)

    Google Scholar 

  40. Wang, Z., Li, C., Wang, X.: Convolutional neural network pruning with structural redundancy reduction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14913–14922 (2021)

    Google Scholar 

  41. Wu, Y., et al.: Tensorpack. https://github.com/tensorpack/ (2016)

  42. Yalniz, I.Z., Jégou, H., Chen, K., Paluri, M., Mahajan, D.: Billion-scale semi-supervised learning for image classification (2019)

    Google Scholar 

  43. You, Z., Yan, K., Ye, J., Ma, M., Wang, P.: Gate decorator: global filter pruning method for accelerating deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 2133–2144 (2019)

    Google Scholar 

  44. Yu, J., Huang, T.: AutoSlim: towards one-shot architecture search for channel numbers. arXiv preprint arXiv:1903.11728 (2019)

  45. Yu, J., Huang, T.S.: Universally slimmable networks and improved training techniques. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1803–1811 (2019)

    Google Scholar 

  46. Yu, J., Yang, L., Xu, N., Yang, J., Huang, T.: Slimmable neural networks (2018)

    Google Scholar 

Download references

Acknowledgments

This work was supported by IITP grants (No.2021-0-02068, AI Innovation Hub and RS-2022-00155915, Artificial Intelligence Convergence Research Center(Inha University)), and the NRF grants (No. 2022R1A2C2010095 and No. 2022R1A4A1033549) funded by the Korea government (MSIT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Byung Cheol Song .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 208 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lee, S., Song, B.C. (2022). Ensemble Knowledge Guided Sub-network Search and Fine-Tuning for Filter Pruning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13671. Springer, Cham. https://doi.org/10.1007/978-3-031-20083-0_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20083-0_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20082-3

  • Online ISBN: 978-3-031-20083-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics