Abstract
With the development of convolutional neural networks, significant breakthroughs have been made in deep learning-based target detection algorithms. However, existing target detection algorithms based on convolutional neural networks need to downsample the whole image to extract deep semantic information from the image, which can lead to the loss of spatial information for small targets, impressive results on large/medium-sized targets, the results are not satisfactory for small target detection. To solve the problem of small target detection and increase the detection precision of small targets, we propose an end-to-end generative adversarial network GAN-STD for small target detection in this paper. GAN-STD makes full use of the structural correlation between targets at different scales through generative adversarial networks to enhance the similar representation of small targets in the shallow feature map and large targets in the deep feature map in the feature extraction process, and reduce the difference between the representation of small targets and large targets, thus making small targets as easy to detect as large targets. In addition, for the detector to perform better localization and classification of small targets, we back-propagate the detector losses to the generator and discriminator for end-to-end training. We merge GAN-STD onto two widely used one-stage target detectors (SSD and YOLOv4) to validate the effectiveness of our proposed GAN-STD. Extensive experiments on the widely used PASCAL VOC, MS COCO, and TT100K datasets show that the proposed GAN-STD algorithm achieves excellent results for detecting small targets.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Ahmad, I., AlQurashi, F., Abozinadah, E., Mehmood, R.: A novel deep learning-based online proctoring system using face recognition, eye blinking, and object detection techniques. Int. J. Adv. Comput. Sci. Appl. (2021). https://doi.org/10.14569/ijacsa.2021.0121094
Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 73–80. IEEE (2010)
Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2874–2883 (2016)
Billah, M., Wang, X., Yu, J., Jiang, Y.: Real-time goat face recognition using convolutional neural network. Comput. Electron. Agric. 194, 106730 (2022)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Cui, J.: Image style migration algorithm based on hsv color model. In: 2022 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), pp. 111–114. IEEE (2022)
Dai, J., Li, Y., He, K., Sun, J.: R-fcn: Object detection via region-based fully convolutional networks. Adv. Neural Inform. Process. Syst. 29 (2016)
Dharejo, F.A., Deeba, F., Zhou, Y., Das, B., Jatoi, M.A., Zawish, M., Du, Y., Wang, X.: Twist-GAN: Towards wavelet transform and transferred GAN for spatio-temporal single image super resolution. ACM Trans. Intell. Syst. Technol. (TIST) 12(6), 1–20 (2021)
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6569–6578 (2019)
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE international conference on computer vision, pp. 1134–1142 (2015)
Girshick, R.: Fast r-CNN. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7310–7311 (2017)
Jiang, H., Peng, M., Zhong, Y., Xie, H., Hao, Z., Lin, J., Ma, X., Hu, X.: A survey on deep learning-based change detection from high-resolution remote sensing images. Remote Sens. 14(7), 1552 (2022)
Kong, T., Sun, F., Tan, C., Liu, H., Huang, W.: Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 169–185 (2018)
Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: 2008 IEEE conference on computer vision and pattern recognition, pp. 1–8. IEEE (2008)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp. 734–750 (2018)
Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Li, F., Zhang, H., Liu, S., Zhang, L., Ni, L.M., Shum, H.Y., et al.: Mask dino: towards a unified transformer-based framework for object detection and segmentation. arXiv preprint arXiv:2206.02777 (2022)
Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., Yan, S.: Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1222–1230 (2017)
Li, Z., Zhou, F.: Fssd: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125 (2017)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, pp. 21–37 (2016)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Luo, S., Yu, J., Xi, Y., Liao, X.: Aircraft target detection in remote sensing images based on improved yolov5. IEEE Access 10, 5184–5192 (2022)
Mukherkjee, D., Saha, P., Kaplun, D., Sinitca, A., Sarkar, R.: Brain tumor image generation using an aggregation of GAN models with style transfer. Sci. Rep. 12(1), 1–16 (2022)
Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recogn. 29(1), 51–59 (1996)
Pasanisi, D., Rota, E., Ermidoro, M., Fasanotti, L.: On domain randomization for object detection in real industrial scenarios using synthetic images. Procedia Comput. Sci. 217, 816–825 (2023)
Peng, F., Yin, L., Long, M.: Bdc-GAN: bidirectional conversion between computer-generated and natural facial images for anti-forensics. IEEE Trans. Circuits Syst. Video Technol. 32(10), 6657–6670 (2022)
Qi, D., Tan, W., Yao, Q., Liu, J.: Yolo5face: why reinventing a face detector. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part V, pp. 228–244. Springer (2023)
Qian, H., Wang, H., Feng, S., Yan, S.: Fessd: Ssd target detection based on feature fusion and feature enhancement. J. Real-Time Image Proc. 20(1), 2 (2023)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: you only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81, 2–23 (2009)
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)
Wang, H., Qian, H., Feng, S., Yan, S.: Calyolov4: lightweight yolov4 target detection based on coordinated attention. J. Supercomput. 79, 1–23 (2023)
Wang, H., Xu, Y., Wang, Z., Cai, Y., Chen, L., Li, Y.: Centernet-auto: a multi-object visual detection algorithm for autonomous driving scenes based on improved centernet. IEEE Trans. Emerg. Topics Comput. Intell. 7, 742–752 (2023)
Wang, T., Zhang, Y., Fan, Y., Wang, J., Chen, Q.: High-fidelity GAN inversion for image attribute editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11379–11388 (2022)
Yi, J., Wu, P., Metaxas, D.N.: Assd: attentive single shot multibox detector. Comput. Vis. Image Understand 189, 102827 (2019)
Zhang, Z., Qiao, S., Xie, C., Shen, W., Wang, B., Yuille, A.L.: Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5813–5821 (2018)
Zhao, H., Min, W., Xu, J., Wang, Q., Zou, Y., Fu, Q.: Scene-adaptive crowd counting method based on meta learning with dual-input network dmnet. Front. Comp. Sci. 17(1), 171304 (2023)
Zhao, X., Xiao, J., Zhang, B., Zhang, Q., Waleed, A.N.: Weight-guided loss for long-tailed object detection and instance segmentation. Signal Process. Image Commun. 110, 116874 (2023)
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, pp. 391–405 (2014)
Acknowledgements
This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant (Funding No.: 2020B0909020001) and National Natural Science Foundation of China (Funding No. 61573113).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, H., Qian, H. & Feng, S. GAN-STD: small target detection based on generative adversarial network. J Real-Time Image Proc 21, 65 (2024). https://doi.org/10.1007/s11554-024-01446-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-024-01446-4