Skip to main content
Log in

One-stage object detection knowledge distillation via adversarial learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Impressive methods for object detection tasks have been proposed based on convolutional neural networks (CNNs), however, they usually use very computation expensive deep networks to obtain such significant performance. Knowledge distillation has attracted much attention in the task of image classification lately since it can use compact models that reduce computations while preserving performance. Moreover, the best performing deep neural networks often assemble the outputs of multiple networks in an average way. However, the memory required to store these networks, and the time required to execute them in inference, which prohibits these methods used in real-time applications. In this paper, we present a knowledge distillation method for one-stage object detection, which can assemble a variety of large, complex trained networks into a lightweight network. In order to transfer diverse knowledge from various trained one-stage object detection networks, an adversarial-based learning strategy is employed as supervision to guide and optimize the lightweight student network to recover the knowledge of teacher networks, and to enable the discriminator module to distinguish the feature of teacher and student simultaneously. The proposed method exhibits two predominant advantages: (1) The lightweight student model can learn the knowledge of the teacher, which contains richer discriminative information than the model trained from scratch. (2) Faster inference speed than traditional ensemble methods from multiple networks is realized. A large number of experiments are carried out on PASCAL VOC and MS COCO datasets to verify the effectiveness of the proposed method for one-stage object detection, which obtains 3.43%, 2.48%, and 5.78% mAP promotions for vgg11-ssd, mobilenetv1-ssd-lite and mobilenetv2-ssd-lite student network on the PASCAL VOC 2007 dataset, respectively. Furthermore, with multi-teacher ensemble method, vgg11-ssd gains 7.10% improvement, which is remarkable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Saba T, Khan M A, Rehman A, Marie-Sainte S L (2019) Region extraction and classification of skin cancer: a heterogeneous framework of deep cnn features fusion and reduction. J Med Syst 43(9):1–19

    Article  Google Scholar 

  2. Khan M A, Khan M A, Ahmed F, Mittal M, Goyal L M, Hemanth D J, Satapathy S C (2020a) Gastrointestinal diseases segmentation and classification based on duo-deep architectures. Pattern Recogn Lett 131:193–204

    Article  Google Scholar 

  3. Khan MA, Kadry S, Alhaisoni M, Nam Y, Zhang Y, Rajinikanth V, Sarfraz MS (2020b) Computer-aided gastrointestinal diseases analysis from wireless capsule endoscopy: A framework of best features selection. IEEE Access 8:132850–132859

  4. Khan MA, Sarfraz MS, Alhaisoni M, Albesher AA, Wang S, Ashraf I (2020c) Stomachnet: optimal deep learning features fusion for stomach abnormalities classification. IEEE Access 8:197969–197981

  5. Ibrahim S W (2016) A comprehensive review on intelligent surveillance systems. Commun Sci Technol 1(1)

  6. Lin S C, Zhang Y, Hsu C H, Skach M, Haque M E, Tang L, Mars J (2018) The architectural implications of autonomous driving: Constraints and acceleration. In: ACM SIGPLAN Notices, vol 53. ACM, pp 751–766

  7. Kaiming H, Xiangyu Z, Shaoqing R, Jian S (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–16

    Google Scholar 

  8. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  9. Ren S, He K, Girshick R, Jian S (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems

  10. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision

  11. Lin T Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision & Pattern Recognition

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  13. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  14. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:150302531

  15. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. arXiv:14126550

  16. Zagoruyko S, Komodakis N (2016) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv:161203928

  17. Chen G, Choi W, Yu X, Han T, Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In: Advances in Neural Information Processing Systems, pp 742–751

  18. Khan M A, Zhang Y D, Sharif M, Akram T (2021a) Pixels to classes: Intelligent learning framework for multiclass skin lesion localization and classification. Comput Electr Eng 90:106956

    Article  Google Scholar 

  19. Khan M U, Aziz S, Akram T, Amjad F, Iqtidar K, Nam Y, Khan M A (2021b) Expert hypertension detection system featuring pulse plethysmograph signals and hybrid feature selection and reduction scheme. Sensors 21(1):247

    Article  Google Scholar 

  20. Khan MA, Akram T, Zhang YD, Sharif M (2021c) Attributes based skin lesion detection and recognition: A mask rcnn and transfer learning-based deep learning framework. Pattern Recogn Lett 143:58–66

  21. Afza F, Khan M A, Sharif M, Kadry S, Manogaran G, Saba T, Ashraf I, Damaševičius R (2021) A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection. Image Vis Comput 106:104090

    Article  Google Scholar 

  22. Rashid M, Khan M A, Alhaisoni M, Wang S H, Naqvi S R, Rehman A, Saba T (2020) A sustainable deep learning framework for object recognition using multi-layers deep features fusion and selection. Sustainability 12(12):5037

    Article  Google Scholar 

  23. Hussain N, Khan M A, Sharif M, Khan S A, Albesher A A, Saba T, Armaghan A (2020) A deep neural network and classical features based scheme for objects recognition: an application for machine inspection. Multimed Tools Appl:1–23

  24. Rashid M, Khan M A, Sharif M, Raza M, Sarfraz M M, Afza F (2019) Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and sift point features. Multimed Tools Appl 78(12):15751–15777

    Article  Google Scholar 

  25. Li Q, Jin S, Yan J (2017) Mimicking very efficient network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6356–6364

  26. Wang T, Yuan L, Zhang X, Feng J (2019) Distilling object detectors with fine-grained feature imitation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 4928–4937

  27. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  28. Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9627–9636

  29. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6569–6578

  30. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  31. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  32. Zhang Y, Ding M, Bai Y, Liu D, Ghanem B (2019a) Learning a strong detector for action localization in videos. Pattern Recogn Lett 128:407–413

    Article  Google Scholar 

  33. Zhang Y, Ding M, Bai Y, Xu M, Ghanem B (2019b) Beyond weakly-supervised: Pseudo ground truths mining for missing bounding-boxes object detection. IEEE Transactions on Circuits and Systems for Video Technology

  34. Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018a) Weakly-supervised object detection via mining pseudo ground truth bounding-boxes. Pattern Recogn 84:68–81

    Article  Google Scholar 

  35. Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018b) W2f: A weakly-supervised to fully-supervised framework for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 928–936

  36. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  37. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:170106659

  38. Sun C, Ai Y, Wang S, Zhang W (2020) Mask-guided ssd for small-object detection. Appl Intell:1–12

  39. Li Z, Zhou F (2017) Fssd: feature fusion single shot multibox detector. arXiv:171200960

  40. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212

  41. Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2017) Dsod: Learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp 1919–1927

  42. Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 528–537

  43. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:170404861

  44. Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv:160808710

  45. Kumar A, Shaikh A M, Li Y, Bilal H, Yin B (2020) Pruning filters with l1-norm and capped l1-norm for cnn compression. Appl Intell:1–9

  46. Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135– 1143

  47. Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv:160202830

  48. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542

  49. Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. arXiv:14053866

  50. Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277

  51. Zhang X, Zou J, Ming X, He K, Sun J (2015) Efficient and accurate approximations of nonlinear convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, pp 1984–1992

  52. Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4133–4141

  53. Fukuda T, Suzuki M, Kurata G, Thomas S, Cui J, Ramabhadran B (2017) Efficient knowledge distillation from an ensemble of teachers. In: Interspeech, pp 3697–3701

  54. Xu Z, Hsu YC, Huang J (2017) Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. arXiv:170900513

  55. Tarvainen A, Valpola H (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in neural information processing systems, pp 1195–1204

  56. Heo B, Lee M, Yun S, Choi J Y (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 3779–3787

  57. Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4320–4328

  58. Mirzadeh SI, Farajtabar M, Li A, Ghasemzadeh H (2019) Improved knowledge distillation via teacher assistant: Bridging the gap between student and teacher. arXiv:190203393

  59. Park W, Kim D, Lu Y, Cho M (2019) Relational knowledge distillation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3967–3976

  60. Wang T, Yuan L, Zhang X, Feng J (2019) Distilling object detectors with fine-grained feature imitation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4933–4942

  61. Shen Z, He Z, Xue X (2019) Meal: Multi-model ensemble via adversarial learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 4886–4893

  62. Oyedotun O K, Aouada D, Ottersten B, et al. (2020) Deep network compression with teacher latent subspace learning and lasso. Appl Intell:1–20

  63. Bagherinezhad H, Horton M, Rastegari M, Farhadi A (2018) Label refinery: Improving imagenet classification through label progression. arXiv:180502641

  64. Gupta S, Hoffman J, Malik J (2016) Cross modal distillation for supervision transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2827–2836

  65. Furlanello T, Lipton ZC, Tschannen M, Itti L, Anandkumar A (2018) Born again neural networks. arXiv:180504770

  66. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  67. Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2606–2615

  68. Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1222–1230

  69. Bai Y, Zhang Y, Ding M, Ghanem B (2018) Sod-mtgan: Small object detection via multi-task generative adversarial network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 206–221

  70. Zhang Y, Ding M, Bai Y, Ghanem B (2019) Detecting small faces in the wild based on generative adversarial network and contextual information. Pattern Recogn 94:74–86

    Article  Google Scholar 

  71. Bai Y, Zhang Y, Ding M, Ghanem B (2018) Finding tiny faces in the wild with generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 21–30

  72. Guo W, Cai J, Wang S (2020) Unsupervised discriminative feature representation via adversarial auto-encoder. Appl Intell 50(4):1155–1171

    Article  Google Scholar 

  73. Heo B, Kim J, Yun S, Park H, Kwak N, Choi JY (2019) A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1921–1930

Download references

Acknowledgements

This work was supported by China Postdoctoral Science Foundation, Grant No.259822.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yongqiang Zhang or Mingli Ding.

Ethics declarations

Conflict of interest

We would like to note that in the manuscript entitled “One-stage Object Detection Knowledge Distillation via Adversarial Learning”, no conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, N., Zhang, Y., Ding, M. et al. One-stage object detection knowledge distillation via adversarial learning. Appl Intell 52, 4582–4598 (2022). https://doi.org/10.1007/s10489-021-02634-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02634-6

Keywords

Navigation