Skip to main content
Log in

Improving the accuracy of pruned network using knowledge distillation

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The introduction of convolutional neural networks (CNN) in image processing field has attracted researchers to explore the applications of CNN itself. Some network designs have been proposed to reach the state-of-the-art capability. However, the current design of neural network remains an issue related to the size of the model. Thus, some researchers introduce to reduce or compress the model size. The compression technique might affect the accuracy of the compressed model compared to the original one. In addition, it may influence the performance of the new model. Furthermore, we need to exploit a new scheme to enhance the accuracy of compressed network. In this study, we explore that knowledge distillation (KD) can be integrated to one of pruning methodologies, namely pruning filters, as the compression technique, to enhance the accuracy of pruned model. From all experimental results, we conclude that incorporating KD to create a MobileNets model can enhance the accuracy of pruned network without elongating the inference time. We measured the inference time of model trained with KD is just 0.1 s longer than that of without KD. Furthermore, by reducing 26.08% of the model size, the accuracy without KD is 63.65% and by incorporating KD, we can enhance to 65.37%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

References

  1. Manyala A, Cholakkal H, Anand V, Kanhangad V, Rajan D (2018) CNN-based gender classification in near-infrared periocular images. Pattern Anal Appl 22(4):1493–1504

    Article  MathSciNet  Google Scholar 

  2. Zhang M, Gao C, Li Q, Wang L, Zhang J (2018) Action detection based on tracklets with the two-stream CNN. Multimed Tools ppl 77(3):3303–3316

    Article  Google Scholar 

  3. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. NIPS'12 proceedings of the 25th international conference on neural information processing systems, 1:1097–1105

  4. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. International conference on learning representations (ICLR).

  5. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRR, abs/1512.03385.

  6. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. CoRR, abs/1409.4842.

  7. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567.

  8. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-ResNet and the impact of residual connections on learning. CoRR, abs/1602.07261.

  9. Hinton G, Vinyals O, Dean J (2014) distilling the knowledge in a neural network. NIPS 2014 Deep learning workshop, NIPS.

  10. Ba LJ, Caruana R (2014) Do Deep nets really need to be deep?. Advances in neural information processing system 27, NIPS.

  11. Aghasi A, Abdi A, Nguyen N, Romberg J (2017) Net-trim: convex pruning of deep neural networks with performance guarantee. 31st conference on neural information processing systems, NIPS.

  12. Dong X, Chen S, Pan SJ (2017) Learning to prune deep neural networks via layer-wise optimal brain surgeon. CoRR. abs/1705.07565.

  13. Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient ConvNets. Proceedings of NIPS workshop on efficient methods for deep neural networks.

  14. Cheng Y, Wang D, Zhou P (2017) A survey of model compression and acceleration for deep neural networks. IEEE signal processing magazine, special issue on deep learning for image understanding.

  15. Hanson SJ, Pratt LY (1989) Comparing biases for minimal network construction with back-propagation. In: DS Touretzky (Ed.) Advances in neural information processing system (NIPS) 1, 177–185.

  16. Cun YL, Denker JS, Solla SA (1990) Advances in neural information processing system 2. Touretzky DS, San Fransisco, Optimal brain damage, USA: Morgan Kaufmann Publishers Inc., CA, 598–605.

  17. Hassibi B, Stork DG, Com SCR (1993) Second Order Derivatives for Network Pruning: Optimal Brain Surgeon. In: Advances in Neural Information Processing Systems 5. Morgan Kaufmann, pp. 164–171.

  18. Bucilua C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ‘06. New York, USA: ACM, pp 535–541.

  19. LeCun Y, Bottou L, Bengio Y (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324

    Article  Google Scholar 

  20. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile version applications. CoRR, abs/1704.04861.

  21. Chollet F (2017) Deep Learning with Depthwise Separable Convolutions. CVPR.

  22. Lopes RG, Fenu S, Starner T (2017) Data-Free Knowledge Distillation for Deep Neural Networks. Proceedings of NIPS Workshop on Learning with Limited Data.

Download references

Acknowledgment

The authors gratefully acknowledge the support by the Ministry of Science and Technology, Taiwan, under grant MOST-109-2218-E-011-012-

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jenq-Shiou Leu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Prakosa, S.W., Leu, JS. & Chen, ZH. Improving the accuracy of pruned network using knowledge distillation. Pattern Anal Applic 24, 819–830 (2021). https://doi.org/10.1007/s10044-020-00940-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-020-00940-2

Keywords

Navigation