Skip to main content
Log in

GAN-STD: small target detection based on generative adversarial network

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

With the development of convolutional neural networks, significant breakthroughs have been made in deep learning-based target detection algorithms. However, existing target detection algorithms based on convolutional neural networks need to downsample the whole image to extract deep semantic information from the image, which can lead to the loss of spatial information for small targets, impressive results on large/medium-sized targets, the results are not satisfactory for small target detection. To solve the problem of small target detection and increase the detection precision of small targets, we propose an end-to-end generative adversarial network GAN-STD for small target detection in this paper. GAN-STD makes full use of the structural correlation between targets at different scales through generative adversarial networks to enhance the similar representation of small targets in the shallow feature map and large targets in the deep feature map in the feature extraction process, and reduce the difference between the representation of small targets and large targets, thus making small targets as easy to detect as large targets. In addition, for the detector to perform better localization and classification of small targets, we back-propagate the detector losses to the generator and discriminator for end-to-end training. We merge GAN-STD onto two widely used one-stage target detectors (SSD and YOLOv4) to validate the effectiveness of our proposed GAN-STD. Extensive experiments on the widely used PASCAL VOC, MS COCO, and TT100K datasets show that the proposed GAN-STD algorithm achieves excellent results for detecting small targets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Ahmad, I., AlQurashi, F., Abozinadah, E., Mehmood, R.: A novel deep learning-based online proctoring system using face recognition, eye blinking, and object detection techniques. Int. J. Adv. Comput. Sci. Appl. (2021). https://doi.org/10.14569/ijacsa.2021.0121094

    Article  Google Scholar 

  2. Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 73–80. IEEE (2010)

  3. Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2874–2883 (2016)

  4. Billah, M., Wang, X., Yu, J., Jiang, Y.: Real-time goat face recognition using convolutional neural network. Comput. Electron. Agric. 194, 106730 (2022)

    Article  Google Scholar 

  5. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  6. Cui, J.: Image style migration algorithm based on hsv color model. In: 2022 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), pp. 111–114. IEEE (2022)

  7. Dai, J., Li, Y., He, K., Sun, J.: R-fcn: Object detection via region-based fully convolutional networks. Adv. Neural Inform. Process. Syst. 29 (2016)

  8. Dharejo, F.A., Deeba, F., Zhou, Y., Das, B., Jatoi, M.A., Zawish, M., Du, Y., Wang, X.: Twist-GAN: Towards wavelet transform and transferred GAN for spatio-temporal single image super resolution. ACM Trans. Intell. Syst. Technol. (TIST) 12(6), 1–20 (2021)

    Article  Google Scholar 

  9. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6569–6578 (2019)

  10. Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

  11. Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE international conference on computer vision, pp. 1134–1142 (2015)

  12. Girshick, R.: Fast r-CNN. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2015)

  13. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)

  14. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  Google Scholar 

  15. Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7310–7311 (2017)

  16. Jiang, H., Peng, M., Zhong, Y., Xie, H., Hao, Z., Lin, J., Ma, X., Hu, X.: A survey on deep learning-based change detection from high-resolution remote sensing images. Remote Sens. 14(7), 1552 (2022)

    Article  Google Scholar 

  17. Kong, T., Sun, F., Tan, C., Liu, H., Huang, W.: Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 169–185 (2018)

  18. Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: object localization by efficient subwindow search. In: 2008 IEEE conference on computer vision and pattern recognition, pp. 1–8. IEEE (2008)

  19. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp. 734–750 (2018)

  20. Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., et al.: Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)

  21. Li, F., Zhang, H., Liu, S., Zhang, L., Ni, L.M., Shum, H.Y., et al.: Mask dino: towards a unified transformer-based framework for object detection and segmentation. arXiv preprint arXiv:2206.02777 (2022)

  22. Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., Yan, S.: Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1222–1230 (2017)

  23. Li, Z., Zhou, F.: Fssd: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960 (2017)

  24. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125 (2017)

  25. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European conference on computer vision, pp. 740–755. Springer (2014)

  26. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, pp. 21–37 (2016)

  27. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  28. Luo, S., Yu, J., Xi, Y., Liao, X.: Aircraft target detection in remote sensing images based on improved yolov5. IEEE Access 10, 5184–5192 (2022)

    Article  Google Scholar 

  29. Mukherkjee, D., Saha, P., Kaplun, D., Sinitca, A., Sarkar, R.: Brain tumor image generation using an aggregation of GAN models with style transfer. Sci. Rep. 12(1), 1–16 (2022)

    Article  Google Scholar 

  30. Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recogn. 29(1), 51–59 (1996)

    Article  Google Scholar 

  31. Pasanisi, D., Rota, E., Ermidoro, M., Fasanotti, L.: On domain randomization for object detection in real industrial scenarios using synthetic images. Procedia Comput. Sci. 217, 816–825 (2023)

    Article  Google Scholar 

  32. Peng, F., Yin, L., Long, M.: Bdc-GAN: bidirectional conversion between computer-generated and natural facial images for anti-forensics. IEEE Trans. Circuits Syst. Video Technol. 32(10), 6657–6670 (2022)

    Article  Google Scholar 

  33. Qi, D., Tan, W., Yao, Q., Liu, J.: Yolo5face: why reinventing a face detector. In: Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part V, pp. 228–244. Springer (2023)

  34. Qian, H., Wang, H., Feng, S., Yan, S.: Fessd: Ssd target detection based on feature fusion and feature enhancement. J. Real-Time Image Proc. 20(1), 2 (2023)

    Article  Google Scholar 

  35. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: you only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)

  36. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271 (2017)

  37. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  38. Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81, 2–23 (2009)

    Article  Google Scholar 

  39. Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)

    Article  Google Scholar 

  40. Wang, H., Qian, H., Feng, S., Yan, S.: Calyolov4: lightweight yolov4 target detection based on coordinated attention. J. Supercomput. 79, 1–23 (2023)

    Article  Google Scholar 

  41. Wang, H., Xu, Y., Wang, Z., Cai, Y., Chen, L., Li, Y.: Centernet-auto: a multi-object visual detection algorithm for autonomous driving scenes based on improved centernet. IEEE Trans. Emerg. Topics Comput. Intell. 7, 742–752 (2023)

    Article  Google Scholar 

  42. Wang, T., Zhang, Y., Fan, Y., Wang, J., Chen, Q.: High-fidelity GAN inversion for image attribute editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11379–11388 (2022)

  43. Yi, J., Wu, P., Metaxas, D.N.: Assd: attentive single shot multibox detector. Comput. Vis. Image Understand 189, 102827 (2019)

    Article  Google Scholar 

  44. Zhang, Z., Qiao, S., Xie, C., Shen, W., Wang, B., Yuille, A.L.: Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5813–5821 (2018)

  45. Zhao, H., Min, W., Xu, J., Wang, Q., Zou, Y., Fu, Q.: Scene-adaptive crowd counting method based on meta learning with dual-input network dmnet. Front. Comp. Sci. 17(1), 171304 (2023)

    Article  Google Scholar 

  46. Zhao, X., Xiao, J., Zhang, B., Zhang, Q., Waleed, A.N.: Weight-guided loss for long-tailed object detection and instance segmentation. Signal Process. Image Commun. 110, 116874 (2023)

    Article  Google Scholar 

  47. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, pp. 391–405 (2014)

Download references

Acknowledgements

This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant (Funding No.: 2020B0909020001) and National Natural Science Foundation of China (Funding No. 61573113).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaming Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Qian, H. & Feng, S. GAN-STD: small target detection based on generative adversarial network. J Real-Time Image Proc 21, 65 (2024). https://doi.org/10.1007/s11554-024-01446-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-024-01446-4

Keywords

Navigation