Skip to main content
Log in

Structured pruning via feature channels similarity and mutual learning for convolutional neural network compression

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The development of convolutional neural network (CNN) have been hindered in resource-constrained devices due to its large memory and calculation. To obtain a light-weight network, we propose feature channels similarity and mutual learning fine tuning (FCS-MLFT) method. To begin with, we focus on the similarity redundancy between the output feature channels of CNN, and propose a novel structured pruning criterion based on the Cosine Similarity, moreover, we use K-Means to cluster the convolution kernels corresponding to the L1 norm of the feature maps into several bins, and calculate the similarity values between feature channels in each bin. Then, different from the traditional method of using the same strategy as the training process to improve the accuracy of the compressed model, we apply mutual learning fine tuning (MLFT) to improve the accuracy of the compact model and the accuracy obtained by the proposed method can achieve the accuracy of the traditional fine tuning (TFT) while significantly shortening the number of epochs. The experimental results not only show the performance of FCS method outperform the existing criteria, such as kernel norm-based and the layer-wise feature norm-based methods, but also prove that MLFT strategy can reduce the number of epochs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: 26th advances in neural information processing systems (NIPS 2012), pp. 1106–1114. Lake Tahoe, Nevada, USA

  2. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778. Las Vegas, Nevada, USA

  3. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  4. Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: proceedings of the European conference on computer vision (ECCV 2016), pp 21–37

  5. Redmon J, Divvala S, Girshick RB, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR 2016), pp 779–788. Las Vegas, NV, USA

  6. Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking Atrous convolution for semantic image segmentation. arXiv:1706.05587

  7. Venkateswara H, Chakraborty S, Panchanathan S (2017) Deep-learning systems for domain adaptation in computer vision: learning transferable feature representations. IEEE Signal Process Magazine 34(6):117–129

    Article  Google Scholar 

  8. Kalchbrenner N, Espeholt L, Simonyan K, Oord AVD, Graves A, Kavukcuoglu K (2016) Neural machine translation in linear time. arXiv:1610.10099

  9. Johnson R, Tong Z (2015) Semi-supervised convolutional neural networks for text categorization via region embedding. Adv Neural Inf Process Syst 28:919–927

    Google Scholar 

  10. Wang R, Li Z, Cao J, Chen T, Wang L (2019) Convolutional Recurrent Neural Networks for Text Classification. In: Proceedings of the 2019 International joint conference on neural networks (IJCNN 2019), pp 1–6. H Budapest

  11. Zhang X, Zou J, Ming X, He K, Sun J (2015) Efficient and accurate approximations of nonlinear convolutional networks. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition (CVPR 2015), pp 1984–1992. Boston

  12. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comput Sci 14(7):38–39

    Google Scholar 

  13. Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks. arXiv:1506.02626

  14. He Y, Lin J, Liu Z, Wang H, Li LJ, Han S (2018) AMC: AutoML for model compression and acceleration on mobile devices. In: Proceedings of the European conference on computer vision (ECCV 208), pp. 815–832. Munich, Germany

  15. Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv:1608.08710

  16. Hu H, Peng R, Tai YW, Tang CK (2016) Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv:1607.03250

  17. Wang W, Zhu L, Guo B (2019) Reliable identification of redundant kernels for convolutional neural network compression. J Vis Commun Image Represent 63(102582):1–12

    Google Scholar 

  18. Prakosa SW, Leu JS, Chen ZH (2021) Improving the accuracy of pruned network using knowledge distillation. Pattern Anal Applic 24:819–830

    Article  Google Scholar 

  19. Yuan L, Tay F, Li G, Wang T, Feng J (2020) Revisit knowledge distillation: a teacher-free framework. In: IEEE/CVF conference on computer vision and patternrecognition (CVPR 2020) 2020

  20. Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep Mutual Learning. In: 2018 IEEE conference on computer vision and pattern recognition (CVPR 2018), pp 4320–4328. Salt Lake City

  21. Lebedev V, Ganin Y, Rakhuba M, Oseledets I, Lempitsky V (2014) Speeding-up convolutional neural networks using fine-tuned cp-decomposition. Computer science. arXiv:1412.6553

  22. Yu X, Liu T, Wang X, Tao D (2017) On Compressing Deep Models by Low Rank and Sparse Decomposition. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR 2017), pp 67–76. Honolulu

  23. Kim YD, Park E, Yoo S, Choi T, Yang L, Shin D (2016) Compression of deep convolutional neural networks for fast and low power mobile applications. In: Proceedings of the 2016 International conference on learning representations (ICLR), San Juan, Puerto, rico arXiv: 1511.06530

  24. Calvi GG, Moniri A, Mahfouz M, Zhao Q, Mandic DP (2019) Compression and interpretability of deep neural networks via tucker tensor layer: from first principles to tensor valued back-propagation. arXiv:1903.06133

  25. Tang R, Lu Y, Liu L, Mou L, Vechtomova O, Lin J (2019) Distilling task-specific knowledge from BERT into simple neural networks. arXiv:1903.12136

  26. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) FitNets: hints for thin deep nets. In: proceedings of the 2015 international conference on learning representations (ICLR), San Diego, CA, United States. arXiv: 1412.6550

  27. Yim J, Joo D, Bae J, Kim J (2017) A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In: 2017 IEEE conference on computer vision and recognition (CVPR 2017), pp 7130–7138. Honolulu

  28. Chen G, Choi W, Yu X, Han T, Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In: Proceedings of the 2017 annual conference on neural information processing systems (NIPS 2017),pp. 743–752. Long Beach, CA

  29. Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: Proceedings of the 2015 International conference on machine learning (ICML 2015), pp 1737–1746, Lile

  30. Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: proceedings of the 2016 international conference on learning representations (ICLR 2016), San Juan, Puerto Rico, arXiv :1502.0255

  31. Courbariaux M, Bengio Y, David JP (2015) BinaryConnect: training deep neural networks with binary weights during propagations. In: Proceedings of the 2015 annual conference on neural information processing systems (NIPS 2015), pp. 3123–3131, Montreal, QC

  32. Iandola FN, Han S, Moskewicz M W, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size. arXiv:1602.0736

  33. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  34. Hu J, Shen L, Sun G (2018) Squeeze-and-Excitation Networks. In: 2018 IEEE conference on computer vision and pattern recognition (CVPR 2018), pp 7132-7141, Salt Lake City

  35. Lecun Y (1990) Optimal brain damage. Neural Inf Proc Syst 2(279):598–605

    Google Scholar 

  36. Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning. In: proceedings of the IEEE international conference on neural networks (ICNN 1993), pp 293–299, San Francisco

  37. Luo J H, Wu J (2017) An entropy-based pruning method for CNN compression. arXiv:1706.05791

  38. Luo J H, Wu J, Lin W (2017) ThiNet: a filter level pruning method for deep neural network compression. In: proceedings of the 2017 IEEE international conference on computer vision (ICCV 2017), pp 5068–5076, Venice

  39. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: proceedings of the 2015 international conference on learning representations (ICIR 2015), San Diego, CA, United States. arXiv:1409.1556

Download references

Author information

Authors and Affiliations

Authors

Contributions

Wei Yang contributed to paper writing. Wei Yang and Yancai Xiao contributed to the whole revision process. All authors have read and agreed to the submitted version of the manuscript.

Corresponding author

Correspondence to Yancai Xiao.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foun- dation of China under Grant 51577008.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, W., Xiao, Y. Structured pruning via feature channels similarity and mutual learning for convolutional neural network compression. Appl Intell 52, 14560–14570 (2022). https://doi.org/10.1007/s10489-022-03403-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03403-9

Keywords

Navigation