One-stage object detection knowledge distillation via adversarial learning

Dong, Na; Zhang, Yongqiang; Ding, Mingli; Xu, Shibiao; Bai, Yancheng

doi:10.1007/s10489-021-02634-6

One-stage object detection knowledge distillation via adversarial learning

Published: 24 July 2021

Volume 52, pages 4582–4598, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Na Dong¹,
Yongqiang Zhang¹,
Mingli Ding¹,
Shibiao Xu² &
…
Yancheng Bai³

1050 Accesses
8 Citations
Explore all metrics

Abstract

Impressive methods for object detection tasks have been proposed based on convolutional neural networks (CNNs), however, they usually use very computation expensive deep networks to obtain such significant performance. Knowledge distillation has attracted much attention in the task of image classification lately since it can use compact models that reduce computations while preserving performance. Moreover, the best performing deep neural networks often assemble the outputs of multiple networks in an average way. However, the memory required to store these networks, and the time required to execute them in inference, which prohibits these methods used in real-time applications. In this paper, we present a knowledge distillation method for one-stage object detection, which can assemble a variety of large, complex trained networks into a lightweight network. In order to transfer diverse knowledge from various trained one-stage object detection networks, an adversarial-based learning strategy is employed as supervision to guide and optimize the lightweight student network to recover the knowledge of teacher networks, and to enable the discriminator module to distinguish the feature of teacher and student simultaneously. The proposed method exhibits two predominant advantages: (1) The lightweight student model can learn the knowledge of the teacher, which contains richer discriminative information than the model trained from scratch. (2) Faster inference speed than traditional ensemble methods from multiple networks is realized. A large number of experiments are carried out on PASCAL VOC and MS COCO datasets to verify the effectiveness of the proposed method for one-stage object detection, which obtains 3.43%, 2.48%, and 5.78% mAP promotions for vgg11-ssd, mobilenetv1-ssd-lite and mobilenetv2-ssd-lite student network on the PASCAL VOC 2007 dataset, respectively. Furthermore, with multi-teacher ensemble method, vgg11-ssd gains 7.10% improvement, which is remarkable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Saba T, Khan M A, Rehman A, Marie-Sainte S L (2019) Region extraction and classification of skin cancer: a heterogeneous framework of deep cnn features fusion and reduction. J Med Syst 43(9):1–19
Article Google Scholar
Khan M A, Khan M A, Ahmed F, Mittal M, Goyal L M, Hemanth D J, Satapathy S C (2020a) Gastrointestinal diseases segmentation and classification based on duo-deep architectures. Pattern Recogn Lett 131:193–204
Article Google Scholar
Khan MA, Kadry S, Alhaisoni M, Nam Y, Zhang Y, Rajinikanth V, Sarfraz MS (2020b) Computer-aided gastrointestinal diseases analysis from wireless capsule endoscopy: A framework of best features selection. IEEE Access 8:132850–132859
Khan MA, Sarfraz MS, Alhaisoni M, Albesher AA, Wang S, Ashraf I (2020c) Stomachnet: optimal deep learning features fusion for stomach abnormalities classification. IEEE Access 8:197969–197981
Ibrahim S W (2016) A comprehensive review on intelligent surveillance systems. Commun Sci Technol 1(1)
Lin S C, Zhang Y, Hsu C H, Skach M, Haque M E, Tang L, Mars J (2018) The architectural implications of autonomous driving: Constraints and acceleration. In: ACM SIGPLAN Notices, vol 53. ACM, pp 751–766
Kaiming H, Xiangyu Z, Shaoqing R, Jian S (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–16
Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R, Jian S (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision
Lin T Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision & Pattern Recognition
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:150302531
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. arXiv:14126550
Zagoruyko S, Komodakis N (2016) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv:161203928
Chen G, Choi W, Yu X, Han T, Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In: Advances in Neural Information Processing Systems, pp 742–751
Khan M A, Zhang Y D, Sharif M, Akram T (2021a) Pixels to classes: Intelligent learning framework for multiclass skin lesion localization and classification. Comput Electr Eng 90:106956
Article Google Scholar
Khan M U, Aziz S, Akram T, Amjad F, Iqtidar K, Nam Y, Khan M A (2021b) Expert hypertension detection system featuring pulse plethysmograph signals and hybrid feature selection and reduction scheme. Sensors 21(1):247
Article Google Scholar
Khan MA, Akram T, Zhang YD, Sharif M (2021c) Attributes based skin lesion detection and recognition: A mask rcnn and transfer learning-based deep learning framework. Pattern Recogn Lett 143:58–66
Afza F, Khan M A, Sharif M, Kadry S, Manogaran G, Saba T, Ashraf I, Damaševičius R (2021) A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection. Image Vis Comput 106:104090
Article Google Scholar
Rashid M, Khan M A, Alhaisoni M, Wang S H, Naqvi S R, Rehman A, Saba T (2020) A sustainable deep learning framework for object recognition using multi-layers deep features fusion and selection. Sustainability 12(12):5037
Article Google Scholar
Hussain N, Khan M A, Sharif M, Khan S A, Albesher A A, Saba T, Armaghan A (2020) A deep neural network and classical features based scheme for objects recognition: an application for machine inspection. Multimed Tools Appl:1–23
Rashid M, Khan M A, Sharif M, Raza M, Sarfraz M M, Afza F (2019) Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and sift point features. Multimed Tools Appl 78(12):15751–15777
Article Google Scholar
Li Q, Jin S, Yan J (2017) Mimicking very efficient network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6356–6364
Wang T, Yuan L, Zhang X, Feng J (2019) Distilling object detectors with fine-grained feature imitation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 4928–4937
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9627–9636
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6569–6578
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Zhang Y, Ding M, Bai Y, Liu D, Ghanem B (2019a) Learning a strong detector for action localization in videos. Pattern Recogn Lett 128:407–413
Article Google Scholar
Zhang Y, Ding M, Bai Y, Xu M, Ghanem B (2019b) Beyond weakly-supervised: Pseudo ground truths mining for missing bounding-boxes object detection. IEEE Transactions on Circuits and Systems for Video Technology
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018a) Weakly-supervised object detection via mining pseudo ground truth bounding-boxes. Pattern Recogn 84:68–81
Article Google Scholar
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018b) W2f: A weakly-supervised to fully-supervised framework for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 928–936
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:170106659
Sun C, Ai Y, Wang S, Zhang W (2020) Mask-guided ssd for small-object detection. Appl Intell:1–12
Li Z, Zhou F (2017) Fssd: feature fusion single shot multibox detector. arXiv:171200960
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2017) Dsod: Learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp 1919–1927
Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 528–537
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:170404861
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv:160808710
Kumar A, Shaikh A M, Li Y, Bilal H, Yin B (2020) Pruning filters with l1-norm and capped l1-norm for cnn compression. Appl Intell:1–9
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135– 1143
Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv:160202830
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542
Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. arXiv:14053866
Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277
Zhang X, Zou J, Ming X, He K, Sun J (2015) Efficient and accurate approximations of nonlinear convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, pp 1984–1992
Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4133–4141
Fukuda T, Suzuki M, Kurata G, Thomas S, Cui J, Ramabhadran B (2017) Efficient knowledge distillation from an ensemble of teachers. In: Interspeech, pp 3697–3701
Xu Z, Hsu YC, Huang J (2017) Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. arXiv:170900513
Tarvainen A, Valpola H (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in neural information processing systems, pp 1195–1204
Heo B, Lee M, Yun S, Choi J Y (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 3779–3787
Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4320–4328
Mirzadeh SI, Farajtabar M, Li A, Ghasemzadeh H (2019) Improved knowledge distillation via teacher assistant: Bridging the gap between student and teacher. arXiv:190203393
Park W, Kim D, Lu Y, Cho M (2019) Relational knowledge distillation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3967–3976
Wang T, Yuan L, Zhang X, Feng J (2019) Distilling object detectors with fine-grained feature imitation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4933–4942
Shen Z, He Z, Xue X (2019) Meal: Multi-model ensemble via adversarial learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 4886–4893
Oyedotun O K, Aouada D, Ottersten B, et al. (2020) Deep network compression with teacher latent subspace learning and lasso. Appl Intell:1–20
Bagherinezhad H, Horton M, Rastegari M, Farhadi A (2018) Label refinery: Improving imagenet classification through label progression. arXiv:180502641
Gupta S, Hoffman J, Malik J (2016) Cross modal distillation for supervision transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2827–2836
Furlanello T, Lipton ZC, Tschannen M, Itti L, Anandkumar A (2018) Born again neural networks. arXiv:180504770
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2606–2615
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1222–1230
Bai Y, Zhang Y, Ding M, Ghanem B (2018) Sod-mtgan: Small object detection via multi-task generative adversarial network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 206–221
Zhang Y, Ding M, Bai Y, Ghanem B (2019) Detecting small faces in the wild based on generative adversarial network and contextual information. Pattern Recogn 94:74–86
Article Google Scholar
Bai Y, Zhang Y, Ding M, Ghanem B (2018) Finding tiny faces in the wild with generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 21–30
Guo W, Cai J, Wang S (2020) Unsupervised discriminative feature representation via adversarial auto-encoder. Appl Intell 50(4):1155–1171
Article Google Scholar
Heo B, Kim J, Yun S, Park H, Kwak N, Choi JY (2019) A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1921–1930

Download references

Acknowledgements

This work was supported by China Postdoctoral Science Foundation, Grant No.259822.

Author information

Authors and Affiliations

School of Instrument Science and Engineering, Harbin Institute of Technology (HIT), Harbin, China
Na Dong, Yongqiang Zhang & Mingli Ding
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Shibiao Xu
Institute of Software, Chinese Academy of Sciences, Beijing, China
Yancheng Bai

Authors

Na Dong
View author publications
You can also search for this author in PubMed Google Scholar
Yongqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mingli Ding
View author publications
You can also search for this author in PubMed Google Scholar
Shibiao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yancheng Bai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yongqiang Zhang or Mingli Ding.

Ethics declarations

Conflict of interest

We would like to note that in the manuscript entitled “One-stage Object Detection Knowledge Distillation via Adversarial Learning”, no conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, N., Zhang, Y., Ding, M. et al. One-stage object detection knowledge distillation via adversarial learning. Appl Intell 52, 4582–4598 (2022). https://doi.org/10.1007/s10489-021-02634-6

Download citation

Accepted: 17 June 2021
Published: 24 July 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02634-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

One-stage object detection knowledge distillation via adversarial learning

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

One-stage object detection knowledge distillation via adversarial learning

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation