Abstract
Impressive methods for object detection tasks have been proposed based on convolutional neural networks (CNNs), however, they usually use very computation expensive deep networks to obtain such significant performance. Knowledge distillation has attracted much attention in the task of image classification lately since it can use compact models that reduce computations while preserving performance. Moreover, the best performing deep neural networks often assemble the outputs of multiple networks in an average way. However, the memory required to store these networks, and the time required to execute them in inference, which prohibits these methods used in real-time applications. In this paper, we present a knowledge distillation method for one-stage object detection, which can assemble a variety of large, complex trained networks into a lightweight network. In order to transfer diverse knowledge from various trained one-stage object detection networks, an adversarial-based learning strategy is employed as supervision to guide and optimize the lightweight student network to recover the knowledge of teacher networks, and to enable the discriminator module to distinguish the feature of teacher and student simultaneously. The proposed method exhibits two predominant advantages: (1) The lightweight student model can learn the knowledge of the teacher, which contains richer discriminative information than the model trained from scratch. (2) Faster inference speed than traditional ensemble methods from multiple networks is realized. A large number of experiments are carried out on PASCAL VOC and MS COCO datasets to verify the effectiveness of the proposed method for one-stage object detection, which obtains 3.43%, 2.48%, and 5.78% mAP promotions for vgg11-ssd, mobilenetv1-ssd-lite and mobilenetv2-ssd-lite student network on the PASCAL VOC 2007 dataset, respectively. Furthermore, with multi-teacher ensemble method, vgg11-ssd gains 7.10% improvement, which is remarkable.
Similar content being viewed by others
References
Saba T, Khan M A, Rehman A, Marie-Sainte S L (2019) Region extraction and classification of skin cancer: a heterogeneous framework of deep cnn features fusion and reduction. J Med Syst 43(9):1–19
Khan M A, Khan M A, Ahmed F, Mittal M, Goyal L M, Hemanth D J, Satapathy S C (2020a) Gastrointestinal diseases segmentation and classification based on duo-deep architectures. Pattern Recogn Lett 131:193–204
Khan MA, Kadry S, Alhaisoni M, Nam Y, Zhang Y, Rajinikanth V, Sarfraz MS (2020b) Computer-aided gastrointestinal diseases analysis from wireless capsule endoscopy: A framework of best features selection. IEEE Access 8:132850–132859
Khan MA, Sarfraz MS, Alhaisoni M, Albesher AA, Wang S, Ashraf I (2020c) Stomachnet: optimal deep learning features fusion for stomach abnormalities classification. IEEE Access 8:197969–197981
Ibrahim S W (2016) A comprehensive review on intelligent surveillance systems. Commun Sci Technol 1(1)
Lin S C, Zhang Y, Hsu C H, Skach M, Haque M E, Tang L, Mars J (2018) The architectural implications of autonomous driving: Constraints and acceleration. In: ACM SIGPLAN Notices, vol 53. ACM, pp 751–766
Kaiming H, Xiangyu Z, Shaoqing R, Jian S (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–16
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R, Jian S (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision
Lin T Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision & Pattern Recognition
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:150302531
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. arXiv:14126550
Zagoruyko S, Komodakis N (2016) Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv:161203928
Chen G, Choi W, Yu X, Han T, Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In: Advances in Neural Information Processing Systems, pp 742–751
Khan M A, Zhang Y D, Sharif M, Akram T (2021a) Pixels to classes: Intelligent learning framework for multiclass skin lesion localization and classification. Comput Electr Eng 90:106956
Khan M U, Aziz S, Akram T, Amjad F, Iqtidar K, Nam Y, Khan M A (2021b) Expert hypertension detection system featuring pulse plethysmograph signals and hybrid feature selection and reduction scheme. Sensors 21(1):247
Khan MA, Akram T, Zhang YD, Sharif M (2021c) Attributes based skin lesion detection and recognition: A mask rcnn and transfer learning-based deep learning framework. Pattern Recogn Lett 143:58–66
Afza F, Khan M A, Sharif M, Kadry S, Manogaran G, Saba T, Ashraf I, Damaševičius R (2021) A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection. Image Vis Comput 106:104090
Rashid M, Khan M A, Alhaisoni M, Wang S H, Naqvi S R, Rehman A, Saba T (2020) A sustainable deep learning framework for object recognition using multi-layers deep features fusion and selection. Sustainability 12(12):5037
Hussain N, Khan M A, Sharif M, Khan S A, Albesher A A, Saba T, Armaghan A (2020) A deep neural network and classical features based scheme for objects recognition: an application for machine inspection. Multimed Tools Appl:1–23
Rashid M, Khan M A, Sharif M, Raza M, Sarfraz M M, Afza F (2019) Object detection and classification: a joint selection and fusion strategy of deep convolutional neural network and sift point features. Multimed Tools Appl 78(12):15751–15777
Li Q, Jin S, Yan J (2017) Mimicking very efficient network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6356–6364
Wang T, Yuan L, Zhang X, Feng J (2019) Distilling object detectors with fine-grained feature imitation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 4928–4937
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9627–9636
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6569–6578
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Zhang Y, Ding M, Bai Y, Liu D, Ghanem B (2019a) Learning a strong detector for action localization in videos. Pattern Recogn Lett 128:407–413
Zhang Y, Ding M, Bai Y, Xu M, Ghanem B (2019b) Beyond weakly-supervised: Pseudo ground truths mining for missing bounding-boxes object detection. IEEE Transactions on Circuits and Systems for Video Technology
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018a) Weakly-supervised object detection via mining pseudo ground truth bounding-boxes. Pattern Recogn 84:68–81
Zhang Y, Bai Y, Ding M, Li Y, Ghanem B (2018b) W2f: A weakly-supervised to fully-supervised framework for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 928–936
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:170106659
Sun C, Ai Y, Wang S, Zhang W (2020) Mask-guided ssd for small-object detection. Appl Intell:1–12
Li Z, Zhou F (2017) Fssd: feature fusion single shot multibox detector. arXiv:171200960
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2017) Dsod: Learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp 1919–1927
Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 528–537
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:170404861
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv:160808710
Kumar A, Shaikh A M, Li Y, Bilal H, Yin B (2020) Pruning filters with l1-norm and capped l1-norm for cnn compression. Appl Intell:1–9
Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135– 1143
Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv:160202830
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision. Springer, pp 525–542
Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. arXiv:14053866
Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277
Zhang X, Zou J, Ming X, He K, Sun J (2015) Efficient and accurate approximations of nonlinear convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and pattern Recognition, pp 1984–1992
Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4133–4141
Fukuda T, Suzuki M, Kurata G, Thomas S, Cui J, Ramabhadran B (2017) Efficient knowledge distillation from an ensemble of teachers. In: Interspeech, pp 3697–3701
Xu Z, Hsu YC, Huang J (2017) Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. arXiv:170900513
Tarvainen A, Valpola H (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in neural information processing systems, pp 1195–1204
Heo B, Lee M, Yun S, Choi J Y (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 3779–3787
Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4320–4328
Mirzadeh SI, Farajtabar M, Li A, Ghasemzadeh H (2019) Improved knowledge distillation via teacher assistant: Bridging the gap between student and teacher. arXiv:190203393
Park W, Kim D, Lu Y, Cho M (2019) Relational knowledge distillation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3967–3976
Wang T, Yuan L, Zhang X, Feng J (2019) Distilling object detectors with fine-grained feature imitation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4933–4942
Shen Z, He Z, Xue X (2019) Meal: Multi-model ensemble via adversarial learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 4886–4893
Oyedotun O K, Aouada D, Ottersten B, et al. (2020) Deep network compression with teacher latent subspace learning and lasso. Appl Intell:1–20
Bagherinezhad H, Horton M, Rastegari M, Farhadi A (2018) Label refinery: Improving imagenet classification through label progression. arXiv:180502641
Gupta S, Hoffman J, Malik J (2016) Cross modal distillation for supervision transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2827–2836
Furlanello T, Lipton ZC, Tschannen M, Itti L, Anandkumar A (2018) Born again neural networks. arXiv:180504770
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2606–2615
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1222–1230
Bai Y, Zhang Y, Ding M, Ghanem B (2018) Sod-mtgan: Small object detection via multi-task generative adversarial network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 206–221
Zhang Y, Ding M, Bai Y, Ghanem B (2019) Detecting small faces in the wild based on generative adversarial network and contextual information. Pattern Recogn 94:74–86
Bai Y, Zhang Y, Ding M, Ghanem B (2018) Finding tiny faces in the wild with generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 21–30
Guo W, Cai J, Wang S (2020) Unsupervised discriminative feature representation via adversarial auto-encoder. Appl Intell 50(4):1155–1171
Heo B, Kim J, Yun S, Park H, Kwak N, Choi JY (2019) A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1921–1930
Acknowledgements
This work was supported by China Postdoctoral Science Foundation, Grant No.259822.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
We would like to note that in the manuscript entitled “One-stage Object Detection Knowledge Distillation via Adversarial Learning”, no conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dong, N., Zhang, Y., Ding, M. et al. One-stage object detection knowledge distillation via adversarial learning. Appl Intell 52, 4582–4598 (2022). https://doi.org/10.1007/s10489-021-02634-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02634-6