Students and teachers learning together: a robust training strategy for neural network pruning

Xiong, Liyan; Chen, Qingsen; Huang, Jiawen; Huang, Xiaohui; Huang, Peng; Wei, Shangfeng

doi:10.1007/s00530-024-01315-x

Students and teachers learning together: a robust training strategy for neural network pruning

Regular Paper
Published: 12 April 2024

Volume 30, article number 122, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Liyan Xiong¹,
Qingsen Chen¹,
Jiawen Huang¹,
Xiaohui Huang¹,
Peng Huang² &
…
Shangfeng Wei³

72 Accesses
Explore all metrics

Abstract

Convolutional neural networks (CNNs) serve as the backbone for extracting image features in the majority of computer vision tasks. In an attempt to make them deployable on small devices, many academics have released small neural networks that they developed by hand or employed compression on large models via model pruning. Model pruning is a simple and efficient way to speed up neural networks. However, the performance of the pruned model (sparse network) falls short of the original model (dense network), and it is not easy to train towards convergence. Recent popular work has focused on improving the effectiveness and convergence of sub-networks. In this paper, we present our solution from the perspective of how to narrow the performance gap between sparse and dense networks, rather than how to obtain a better sub-network. For bridging the gap in their performance, we propose a novel training strategy by way of mutual learning. Furthermore, we provide a new pruning criterion called matching distance (MD) that aims to enable the sparse networks to inherit the majority of the knowledge learned from the dense networks. The experimental results demonstrate that our approach enables knowledge from dense networks to be transferred to sparse networks more efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-Driven Sparse Structure Selection for Deep Neural Networks

Search-and-Train: Two-Stage Model Compression and Acceleration

PrUE: Distilling Knowledge from Sparse Teacher Networks

Data Availability

All data included in this study are available upon request by contact with the corresponding author.

References

Frankle, J., Carbin, M.: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019 (OpenReview.net, 2019). https://openreview.net/forum?id=rJl-b3RcF7 (2019)
Bellec, G., Kappel, D., Maass, W., Legenstein, R.: Deep rewiring: training very sparse deep networks (2018)
He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4335–4344 (2019). https://doi.org/10.1109/CVPR.2019.00447
Li, Y., Adamczewski, K., Li, W., Gu, S., Timofte, R., Van Gool, L.: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 191–201 (2022). https://doi.org/10.1109/CVPR52688.2022.00029
Wang, H., Qin, C., Bai, Y., Zhang, Y., Fu, Y.: IJCAI (2022)
Frankle, J., Dziugaite, G.K., Roy, D.M., Carbin, M.: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021 (OpenReview.net, 2021). https://openreview.net/forum?id=Ig-VyQc-MLK (2021)
Bai, Y., Wang, H., TAO, Z., Li, K., Fu, Y.: International Conference on Learning Representations. https://openreview.net/forum?id=fOsN52jn25l (2022)
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2755–2763 (2017). https://doi.org/10.1109/ICCV.2017.298
Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., Shao, L.: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1526–1535 (2020). https://doi.org/10.1109/CVPR42600.2020.00160
Li, T., Wu, B., Yang, Y., Fan, Y., Zhang, Y., Liu, W.: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3972–3981 (2019). https://doi.org/10.1109/CVPR.2019.00410
He, Y., Ding, Y., Liu, P., Zhu, L., Zhang, H., Yang, Y.: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2006–2015 (2020). https://doi.org/10.1109/CVPR42600.2020.00208
Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Yang, T., Zhu, S., Chen, C., Yan, S., Zhang, M., Willis, A.: European Conference on Computer Vision, pp. 299–315. Springer (2020)
Song, K., Xie, J., Zhang, S., Luo, Z.: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11848–11857 (2023). https://doi.org/10.1109/CVPR52729.2023.01140
Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (ed.): Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/file/ae0eb3eed39d2bcef4622b2499a05fe6-Paper.pdf (2015)
Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett R. (ed.): Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2016/file/2823f4797102ce1a1aec05359cc16dd9-Paper.pdf (2016)
He, Y., Kang, G., Dong, X., Fu, Y., Yang, Y.: Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks. http://arxiv.org/abs/1808.06866. ArXiv:1808.06866 [cs] (2018)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp. 11953–11962 (2022)
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets (2015)
Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., Choi, J.Y.: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 1921–1930 (2019). https://doi.org/10.1109/ICCV.2019.00201
Chen, D., Mei, J.P., Zhang, Y., Wang, C., Wang, Z., Feng, Y., Chen, C.: Cross-layer distillation with semantic calibration. Proc. AAAI Conf. Artif. Intell. 35(8), 7028–7036 (2021). https://doi.org/10.1609/aaai.v35i8.16865
Article Google Scholar
Yim, J., Joo, D., Bae, J., Kim, J.: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 7130–7138 (2017). https://doi.org/10.1109/CVPR.2017.754
Liu, Y., Cao, J., Li, B., Yuan, C., Hu, W., Li, Y., Duan, Y.: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7089–7097 (2019). https://doi.org/10.1109/CVPR.2019.00726
Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. Proc. AAAI Conf. Artif. Intell. 34(04), 5191–5198 (2020). https://doi.org/10.1609/aaai.v34i04.5963
Article Google Scholar
Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.): Computer Vision - ECCV 2022, pp. 120–136. Springer Nature Switzerland, Cham (2022)
He, K., Zhang, X., Ren, S., Sun, J.: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N 7(7), 3 (2015)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)
Dong, X., Huang, J., Yang, Y., Yan, S.: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1895–1903 (2017). https://doi.org/10.1109/CVPR.2017.205
Zheng, Y., Sun, P., Ren, Q., Xu, W., Zhu, D.: A novel and efficient model pruning method for deep convolutional neural networks by evaluating the direct and indirect effects of filters. Neurocomputing 569, 127124 (2024)
Article Google Scholar
Shi, Y., Tang, A., Niu, L., Zhou, R.: Sparse optimization guided pruning for neural networks. Neurocomputing 574, 127280 (2024)
Article Google Scholar
Lin, M., Ji, R., Wang, Y., Zhang, Y., Zhang, B., Tian, Y., Shao, L.: HRank: Filter Pruning Using High-Rank Feature Map p. 10
Guan, Y., Liu, N., Zhao, P., Che, Z., Bian, K., Wang, Y., Tang, J.: Dais: Automatic channel pruning via differentiable annealing indicator search. IEEE Transactions on Neural Networks and Learning Systems. pp. 1–12 (2022). https://doi.org/10.1109/TNNLS.2022.3161284
Zhang, Y., Yao, Y., Ram, P., Zhao, P., Chen, T., Hong, M., Wang, Y., Liu, S.: Thirty-sixth Conference on Neural Information Processing Systems (2022)
Wang, H., Fu, Y.: Trainability preserving neural structured pruning. arXiv preprint arXiv:2207.12534 (2022)
Xue, Y., Yao, W., Peng, S., Yao, S.: Automatic filter pruning algorithm for image classification. Appl. Intell. 54(1), 216–230 (2024)
Article Google Scholar
Dong, Z., Duan, Y., Zhou, Y., Duan, S., Hu, X.: Weight-adaptive channel pruning for cnns based on closeness-centrality modeling. Appl. Intell. 54(1), 201–215 (2024)
Article Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014, pp. 818–833. Springer International Publishing, Cham (2014)
Eccles, B.J., Rodgers, P., Kilpatrick, P., Spence, I., Varghese, B.: Dnnshifter: an efficient dnn pruning system for edge computing. Future Gener. Comput. Syst. 152, 43–54 (2024)
Article Google Scholar
Lin, M., Ji, R., Zhang, Y., Zhang, B., Tian, Y.: Channel pruning via automatic structure search (2020)
Cai, L., An, Z., Yang, C., Yan, Y., Xu, Y.: Proc. AAAI Conf. Artif. Intell. 36, 140–148 (2022)
Tung, F., Mori, G.: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1365–1374 (2019). https://doi.org/10.1109/ICCV.2019.00145
Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9155–9163 (2019). https://doi.org/10.1109/CVPR.2019.00938
Park, W., Kim, D., Lu, Y., Cho, M.: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3962–3971. IEEE Computer Society, Los Alamitos, CA, USA, 2019. https://doi.org/10.1109/CVPR.2019.00409. https://doi.ieeecomputersociety.org/10.1109/CVPR.2019.00409 (2019)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62067002, and 62062033, in part by the Science and Technology Program Project of Jiangxi Province Department of Transportation under Grant 2022X0040.

Funding

National Natural Science Foundation of China (62067002, 62062033); Science and Technology Program Project of Jiangxi Province Department of Transportation (2022X0040).

Author information

Authors and Affiliations

School of Information and Engineering, East China Jiaotong University, Nanchang, 330000, Jiangxi, China
Liyan Xiong, Qingsen Chen, Jiawen Huang & Xiaohui Huang
Road Network Operation Management Company, Jiangxi Provincial Communications Investment Group Co.,Ltd, Nanchang, 330000, Jiangxi, China
Peng Huang
Guangdong CAS Cogniser Information Technology Co., Ltd, Guangzhou, 510000, Guangdong, China
Shangfeng Wei

Authors

Liyan Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Qingsen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiawen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shangfeng Wei
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Liyan Xiong agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved; Qingsen Chen made substantial contributions to the conception and drafted the work; Jiawen Huang made contributions to acquisition, analysis, or interpretation of data; Xiaohui Huang revised it critically for important intellectual content; Peng Huang made contributions to the acquisition, analysis, or interpretation of data; Shangfeng Wei made contributions to the creation of new software used in the work. All the authors reviewed the manuscript.

Corresponding author

Correspondence to Qingsen Chen.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest

Additional information

Communicated by F. Wu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xiong, L., Chen, Q., Huang, J. et al. Students and teachers learning together: a robust training strategy for neural network pruning. Multimedia Systems 30, 122 (2024). https://doi.org/10.1007/s00530-024-01315-x

Download citation

Received: 07 November 2023
Accepted: 06 March 2024
Published: 12 April 2024
DOI: https://doi.org/10.1007/s00530-024-01315-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Students and teachers learning together: a robust training strategy for neural network pruning

Abstract

Access this article

Similar content being viewed by others

Data-Driven Sparse Structure Selection for Deep Neural Networks

Search-and-Train: Two-Stage Model Compression and Acceleration

PrUE: Distilling Knowledge from Sparse Teacher Networks

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Students and teachers learning together: a robust training strategy for neural network pruning

Abstract

Access this article

Similar content being viewed by others

Data-Driven Sparse Structure Selection for Deep Neural Networks

Search-and-Train: Two-Stage Model Compression and Acceleration

PrUE: Distilling Knowledge from Sparse Teacher Networks

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation