Skip to main content
Log in

Students and teachers learning together: a robust training strategy for neural network pruning

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) serve as the backbone for extracting image features in the majority of computer vision tasks. In an attempt to make them deployable on small devices, many academics have released small neural networks that they developed by hand or employed compression on large models via model pruning. Model pruning is a simple and efficient way to speed up neural networks. However, the performance of the pruned model (sparse network) falls short of the original model (dense network), and it is not easy to train towards convergence. Recent popular work has focused on improving the effectiveness and convergence of sub-networks. In this paper, we present our solution from the perspective of how to narrow the performance gap between sparse and dense networks, rather than how to obtain a better sub-network. For bridging the gap in their performance, we propose a novel training strategy by way of mutual learning. Furthermore, we provide a new pruning criterion called matching distance (MD) that aims to enable the sparse networks to inherit the majority of the knowledge learned from the dense networks. The experimental results demonstrate that our approach enables knowledge from dense networks to be transferred to sparse networks more efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

All data included in this study are available upon request by contact with the corresponding author.

References

  1. Frankle, J., Carbin, M.: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019 (OpenReview.net, 2019). https://openreview.net/forum?id=rJl-b3RcF7 (2019)

  2. Bellec, G., Kappel, D., Maass, W., Legenstein, R.: Deep rewiring: training very sparse deep networks (2018)

  3. He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4335–4344 (2019). https://doi.org/10.1109/CVPR.2019.00447

  4. Li, Y., Adamczewski, K., Li, W., Gu, S., Timofte, R., Van Gool, L.: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 191–201 (2022). https://doi.org/10.1109/CVPR52688.2022.00029

  5. Wang, H., Qin, C., Bai, Y., Zhang, Y., Fu, Y.: IJCAI (2022)

  6. Frankle, J., Dziugaite, G.K., Roy, D.M., Carbin, M.: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021 (OpenReview.net, 2021). https://openreview.net/forum?id=Ig-VyQc-MLK (2021)

  7. Bai, Y., Wang, H., TAO, Z., Li, K., Fu, Y.: International Conference on Learning Representations. https://openreview.net/forum?id=fOsN52jn25l (2022)

  8. Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2755–2763 (2017). https://doi.org/10.1109/ICCV.2017.298

  9. Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., Shao, L.: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1526–1535 (2020). https://doi.org/10.1109/CVPR42600.2020.00160

  10. Li, T., Wu, B., Yang, Y., Fan, Y., Zhang, Y., Liu, W.: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3972–3981 (2019). https://doi.org/10.1109/CVPR.2019.00410

  11. He, Y., Ding, Y., Liu, P., Zhu, L., Zhang, H., Yang, Y.: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2006–2015 (2020). https://doi.org/10.1109/CVPR42600.2020.00208

  12. Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  13. Yang, T., Zhu, S., Chen, C., Yan, S., Zhang, M., Willis, A.: European Conference on Computer Vision, pp. 299–315. Springer (2020)

  14. Song, K., Xie, J., Zhang, S., Luo, Z.: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11848–11857 (2023). https://doi.org/10.1109/CVPR52729.2023.01140

  15. Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (ed.): Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/file/ae0eb3eed39d2bcef4622b2499a05fe6-Paper.pdf (2015)

  16. Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett R. (ed.): Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2016/file/2823f4797102ce1a1aec05359cc16dd9-Paper.pdf (2016)

  17. He, Y., Kang, G., Dong, X., Fu, Y., Yang, Y.: Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks. http://arxiv.org/abs/1808.06866. ArXiv:1808.06866 [cs] (2018)

  18. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  19. Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp. 11953–11962 (2022)

  20. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets (2015)

  21. Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., Choi, J.Y.: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 1921–1930 (2019). https://doi.org/10.1109/ICCV.2019.00201

  22. Chen, D., Mei, J.P., Zhang, Y., Wang, C., Wang, Z., Feng, Y., Chen, C.: Cross-layer distillation with semantic calibration. Proc. AAAI Conf. Artif. Intell. 35(8), 7028–7036 (2021). https://doi.org/10.1609/aaai.v35i8.16865

    Article  Google Scholar 

  23. Yim, J., Joo, D., Bae, J., Kim, J.: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 7130–7138 (2017). https://doi.org/10.1109/CVPR.2017.754

  24. Liu, Y., Cao, J., Li, B., Yuan, C., Hu, W., Li, Y., Duan, Y.: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7089–7097 (2019). https://doi.org/10.1109/CVPR.2019.00726

  25. Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. Proc. AAAI Conf. Artif. Intell. 34(04), 5191–5198 (2020). https://doi.org/10.1609/aaai.v34i04.5963

    Article  Google Scholar 

  26. Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.): Computer Vision - ECCV 2022, pp. 120–136. Springer Nature Switzerland, Cham (2022)

  27. He, K., Zhang, X., Ren, S., Sun, J.: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  28. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

  29. Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N 7(7), 3 (2015)

    Google Scholar 

  30. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  31. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)

  32. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

  33. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)

  34. Dong, X., Huang, J., Yang, Y., Yan, S.: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1895–1903 (2017). https://doi.org/10.1109/CVPR.2017.205

  35. Zheng, Y., Sun, P., Ren, Q., Xu, W., Zhu, D.: A novel and efficient model pruning method for deep convolutional neural networks by evaluating the direct and indirect effects of filters. Neurocomputing 569, 127124 (2024)

    Article  Google Scholar 

  36. Shi, Y., Tang, A., Niu, L., Zhou, R.: Sparse optimization guided pruning for neural networks. Neurocomputing 574, 127280 (2024)

    Article  Google Scholar 

  37. Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., Shao, L.: HRank: Filter Pruning Using High-Rank Feature Map p. 10

  38. Guan, Y., Liu, N., Zhao, P., Che, Z., Bian, K., Wang, Y., Tang, J.: Dais: Automatic channel pruning via differentiable annealing indicator search. IEEE Transactions on Neural Networks and Learning Systems. pp. 1–12 (2022). https://doi.org/10.1109/TNNLS.2022.3161284

  39. Zhang, Y., Yao, Y., Ram, P., Zhao, P., Chen, T., Hong, M., Wang, Y., Liu, S.: Thirty-sixth Conference on Neural Information Processing Systems (2022)

  40. Wang, H., Fu, Y.: Trainability preserving neural structured pruning. arXiv preprint arXiv:2207.12534 (2022)

  41. Xue, Y., Yao, W., Peng, S., Yao, S.: Automatic filter pruning algorithm for image classification. Appl. Intell. 54(1), 216–230 (2024)

    Article  Google Scholar 

  42. Dong, Z., Duan, Y., Zhou, Y., Duan, S., Hu, X.: Weight-adaptive channel pruning for cnns based on closeness-centrality modeling. Appl. Intell. 54(1), 201–215 (2024)

    Article  Google Scholar 

  43. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014, pp. 818–833. Springer International Publishing, Cham (2014)

  44. Eccles, B.J., Rodgers, P., Kilpatrick, P., Spence, I., Varghese, B.: Dnnshifter: an efficient dnn pruning system for edge computing. Future Gener. Comput. Syst. 152, 43–54 (2024)

    Article  Google Scholar 

  45. Lin, M., Ji, R., Zhang, Y., Zhang, B., Tian, Y.: Channel pruning via automatic structure search (2020)

  46. Cai, L., An, Z., Yang, C., Yan, Y., Xu, Y.: Proc. AAAI Conf. Artif. Intell. 36, 140–148 (2022)

  47. Tung, F., Mori, G.: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1365–1374 (2019). https://doi.org/10.1109/ICCV.2019.00145

  48. Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9155–9163 (2019). https://doi.org/10.1109/CVPR.2019.00938

  49. Park, W., Kim, D., Lu, Y., Cho, M.: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3962–3971. IEEE Computer Society, Los Alamitos, CA, USA, 2019. https://doi.org/10.1109/CVPR.2019.00409. https://doi.ieeecomputersociety.org/10.1109/CVPR.2019.00409 (2019)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62067002, and 62062033, in part by the Science and Technology Program Project of Jiangxi Province Department of Transportation under Grant 2022X0040.

Funding

National Natural Science Foundation of China (62067002, 62062033); Science and Technology Program Project of Jiangxi Province Department of Transportation (2022X0040).

Author information

Authors and Affiliations

Authors

Contributions

Liyan Xiong agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved; Qingsen Chen made substantial contributions to the conception and drafted the work; Jiawen Huang made contributions to acquisition, analysis, or interpretation of data; Xiaohui Huang revised it critically for important intellectual content; Peng Huang made contributions to the acquisition, analysis, or interpretation of data; Shangfeng Wei made contributions to the creation of new software used in the work. All the authors reviewed the manuscript.

Corresponding author

Correspondence to Qingsen Chen.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest

Additional information

Communicated by F. Wu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, L., Chen, Q., Huang, J. et al. Students and teachers learning together: a robust training strategy for neural network pruning. Multimedia Systems 30, 122 (2024). https://doi.org/10.1007/s00530-024-01315-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01315-x

Keywords

Navigation