Skip to main content
Log in

An enhanced SSD with feature cross-reinforcement for small-object detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

A Correction to this article was published on 06 May 2023

This article has been updated

Abstract

Due to the limited feature information possessed by small objects in images, it is difficult for a single-shot multibox detector (SSD) to quickly notice the important regions of these small image objects. We propose an enhanced SSD based on feature cross-reinforcement (FCR-SSD). For shallow sampling, an improved group shuffling-efficient channel attention (GS-ECA) mechanism is used to make the model focus on the object areas rather than the background. Then, an FCR module allows the multiscale information from the shallow layer to be passed to the subsequent layer and fused to generate an enhanced feature map, which improves the utilization of the context information associated with small objects. We develop an adaptive algorithm for calculating positive and negative candidate box selection thresholds to select positive and negative samples, determine the intersection over union (IOU) thresholds of candidate boxes and ground-truth boxes, and adaptively determine the threshold for each ground-truth box. The proposed FCR-SSD algorithm achieves 79.6% mean average precision (mAP) for the PASCAL VOC 2007 dataset and 30.1% mAP for the MS COCO dataset at 34.2 frames per second (FPS) when run on an RTX 3080Ti GPU. The experimental results show that the FCR-SSD model yields high accuracy and a good detection speed in small-target detection tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Change history

References

  1. Wei J, He J, Zhou Y, Chen K, Tang Z, Xiong Z (2020) Enhanced object detection with deep convolutional neural networks for Advanced driving assistance. IEEE Trans Intell Transp Syst 21(4):1572–1583

    Article  Google Scholar 

  2. Guo G, Wang H, Yan Y, Zheng J, Li B (2020) A fast face detection method via convolutional neural network. Neurocomputing 395:128–137

    Article  Google Scholar 

  3. Ferrag MA, Maglaras L, Moschoyiannis S, Janicke H (2020) Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J Inform Secur Appl 50:1–19

    Google Scholar 

  4. Qiu T, Wen C, Xie K, Wen F, Sheng G, Tang X (2019) Efficient medical image enhancement based on CNN-FBB model. IET Image Proc 13(10):1736–1744

    Article  Google Scholar 

  5. Retnamony J, Muniasamy S, Stanley B (2022) Enhanced global and local face feature extraction for effective recognition of facial emotions. Concurrency and Computation-Practice & Experience 34(5). https://doi.org/10.1002/cpe.6701

  6. Owczarek M (2020) The impact and importance of fabric image preprocessing for the new method of individual inter-thread pores detection. Autex Res J 20(3):250–262

    Article  Google Scholar 

  7. Zhu Y, Zhang F, Li L, Lin Y, Zhang Z, Shi L, Qin T (2021) Research on classification model of Panax notoginseng taproots based on machine Vision Feature Fusion. Sensors 21(23). https://doi.org/10.3390/s2123794

  8. Guo S, Liu F, Yuan X, Zou C, Chen L, Shen T (2021) HSPOG: an optimized target Recognition Method based on histogram of spatial pyramid oriented gradients. Tsinghua Sci Technol 26(4):475–483

    Article  Google Scholar 

  9. Perez-Benito F, Signol F, Perez-Cortes J, Pollan M, Perez-Gomez B, Salas-Trejo D, Llobet R (2019) Global parenchymal texture features based on histograms of oriented gradients improve cancer development risk estimation from healthy breasts. Comput Methods Programs Biomed 177:123–132. https://doi.org/10.1016/j.cmpb.2019.05.022

    Article  Google Scholar 

  10. Safdari M, Moallem P, Satari M (2019) SIFT detector boosted by adaptive contrast threshold to improve matching robustness of Remote sensing panchromatic images. Ieee J Sel Top Appl Earth Observations Remote Sens 12(2):675–684. https://doi.org/10.1109/jstars.2019.2892360

    Article  Google Scholar 

  11. Doyle L, Mould D (2019) Augmenting photographs with textures using the Laplacian pyramid. Visual Comput 35(10):1489–1500

    Article  Google Scholar 

  12. Fu Z, Zhao Y, Xu Y, Xu L, Xu J (2020) Gradient structural similarity based gradient filtering for multi-modal image fusion. Inform Fusion 53:251–268. https://doi.org/10.1016/j.inffus.2019.06.025

    Article  Google Scholar 

  13. Xu J, Liu Z, Hou Y, Zhen X, Shao L, Cheng M (2021) Pixel-level non-local image smoothing with objective evaluation. IEEE Trans Multimedia 23:4065–4078

    Article  Google Scholar 

  14. Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaria J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8(1):1–74

    Article  Google Scholar 

  15. Zhou D (2018) Deep distributed convolutional neural networks: universality. Anal Appl 16(6):895–919

    Article  MathSciNet  MATH  Google Scholar 

  16. Mu R, Zeng X (2019) A review of deep learning research. KSII Trans Internet Inform Syst (TIIS) 13(4):1738–1764. https://doi.org/10.3837/tiis.2019.04.001

    Article  Google Scholar 

  17. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  18. Wang J, Wang W, Gao W (2018) Multiscale Deep alternative neural network for large-scale video classification. IEEE Trans Multimedia 20(10):2578–2592. https://doi.org/10.1109/tmm.2018.2855081

    Article  Google Scholar 

  19. Wang D, Li Y, Ma L, Bai Z, Chan J (2019) Going deeper with densely connected convolutional neural networks for Multispectral Pansharpening. Remote Sens 11(22). https://doi.org/10.3390/rs11222608

  20. Ha V, Ren J, Xu X, Liao W, Zhao S, Ren J, Yan G (2020) Optimized highway deep learning network for fast single image super-resolution reconstruction. J Real-Time Image Proc 17(6):1961–1970. https://doi.org/10.1007/s11554-020-00973-0

    Article  Google Scholar 

  21. Lu Y, Dong L, Zhang T, Xu W (2020) A robust detection algorithm for Infrared Maritime Small and Dim targets. Sensors 20(4):1–19

    Article  Google Scholar 

  22. Li Y, Zhang D, Lee D (2019) IIRNet: a lightweight deep neural network using intensely inverted residuals for image recognition. Image Vis Comput 92:1–8

    Article  Google Scholar 

  23. Shelhamer E, Long J, Darrell T (2017) Fully Convolutional Networks for Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Article  Google Scholar 

  24. Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a Comprehensive Review. Neural Comput 29(9):2352–2449

    Article  MathSciNet  MATH  Google Scholar 

  25. Zhu K, Wang R, Zhao Q, Cheng J, Tao D (2020) A cuboid CNN Model with an attention mechanism for Skeleton-Based action recognition. IEEE Trans Multimedia 22(11):2977–2989

    Article  Google Scholar 

  26. Omar W, Oh Y, Chung J, Lee I (2021) Aerial dataset integration for vehicle detection based on YOLOv4. Korean J Remote Sens 37(4):747–761. https://doi.org/10.7780/kjrs.2021.37.4.6

    Article  Google Scholar 

  27. Vinyals O, Toshev A, Bengio S, Erhan D (2017) Show and tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge. IEEE Trans Pattern Anal Mach Intell 39(4):652–663

    Article  Google Scholar 

  28. Xi R, Hou J, Lou W (2020) Potato bud detection with improved faster R-CNN. Trans Asabe 63(3):557–569. https://doi.org/10.13031/trans.13628

    Article  Google Scholar 

  29. Ren P, Wang L, Fang W, Song S, Djahel S (2020) A novel squeeze YOLO-based real-time people counting approach. Int J Bio-Inspired Comput 16(2):94–101. https://doi.org/10.1504/ijbic.2020.109674

    Article  Google Scholar 

  30. Biswas D, Su H, Wang C, Stevanovic A, Wang W (2019) An automatic traffic density estimation using single shot detection (SSD) and MobileNet-SSD. Phys Chem Earth 110:176–184. https://doi.org/10.1016/j.pce.2018.12.001

    Article  Google Scholar 

  31. Cao J, Song C, Song S, Peng S, Wang D, Shao Y, Xiao F (2020) Vehicle detection algorithm for Smart Car based on improved SSD model. Sensors 20(16):1–21

    Article  Google Scholar 

  32. Cheng Y, Liu W, Xing W (2021) Weighted feature fusion and attention mechanism for object detection. J Electron Imaging 30(2):1–12

    Article  Google Scholar 

  33. Zhou S, Qiu J (2021) Enhanced SSD with interactive multi-scale attention features for object detection. Multimedia Tools and Applications 80(8):11539–11556. https://doi.org/10.1007/s11042-020-10191-2

    Article  MathSciNet  Google Scholar 

  34. Cai Z, Vasconcelos N (2021) Cascade R-CNN: high quality object detection and Instance Segmentation. IEEE Trans Pattern Anal Mach Intell 43(5):1483–1498

    Article  Google Scholar 

  35. Lin T, Goyal P, Girshick R, He K, Dollar P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327

    Article  Google Scholar 

  36. Zhang K, Cui L, Yin Y (2020) A multivariate grey incidence model for different scale data based on spatial pyramid pooling. J Syst Eng Electron 31(4):770–779. https://doi.org/10.23919/jsee.2020.000052

    Article  Google Scholar 

  37. Li T, Yu Y, Huang C, Yang J, Zhong Y, Hao Y (2022) Method for predicting cutter remaining life based on multi-scale cyclic convolutional network. Int J Distrib Sens Netw 18(5). https://doi.org/10.1177/15501329221102077

  38. Chen S, Tan X, Wang B, Lu H, Hu X, Fu Y (2020) Reverse attention-based residual network for salient object detection. IEEE Trans Image Process 29:3763–3776

    Article  MATH  Google Scholar 

  39. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023

    Article  Google Scholar 

  40. Xue H, Sun M, Liang Y (2022) ECANet: explicit cyclic attention-based network for video saliency prediction. Neurocomputing 468:233–244

    Article  Google Scholar 

  41. Khan R, Khattak H, Wong W, AlSalman H, Mosleh M, Rahman S (2021) Intelligent Malaysian Sign Language Translation System Using Convolutional-Based Attention Module with Residual Network. Computational Intelligence and Neuroscience, 2021. doi:https://doi.org/10.1155/2021/9023010

  42. Lee H, Kwon H (2017) Going deeper with contextual CNN for Hyperspectral Image classification. IEEE Trans Image Process 26(10):4843–4855

    Article  MathSciNet  Google Scholar 

  43. Poernomo A, Kang D (2018) Biased dropout and Crossmap Dropout: learning towards effective dropout regularization in convolutional neural network. Neural Netw 104:60–67. https://doi.org/10.1016/j.neunet.2018.03.016

    Article  Google Scholar 

  44. Abu Al-Haija Q (2022) Leveraging ShuffleNet transfer learning to enhance handwritten character recognition. Gene Expr Patterns 45. https://doi.org/10.1016/j.gep.2022.119263

  45. Wang J, Yu J, He Z (2022) DECA: a novel multi-scale efficient channel attention module for object detection in real-life fire images. Appl Intell 52(2):1362–1375. https://doi.org/10.1007/s10489-021-02496-y

    Article  Google Scholar 

  46. Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11

    Article  Google Scholar 

  47. Li S, Sultonov F, Tursunboev J, Park J, Yun S, Kang J (2022) Ghostformer: a GhostNet-Based two-stage transformer for small object detection. Sensors 22(18). https://doi.org/10.3390/s22186939

  48. Bai L, Zhao Y, Huang X (2018) A CNN Accelerator on FPGA using depthwise separable convolution. Ieee Trans Circuits Syst Ii-Express Briefs 65(10):1415–1419

    Google Scholar 

  49. Wu K, Bai C, Wang D, Liu Z, Huang T, Zheng H (2021) Improved object detection algorithm of YOLOv3 Remote sensing image. Ieee Access 9:113889–113900. https://doi.org/10.1109/access.2021.3103522

    Article  Google Scholar 

  50. Yarotsky D (2017) Error bounds for approximations with deep ReLU networks. Neural Netw 94:103–114

    Article  MATH  Google Scholar 

  51. Li M, Xu D, Zhang D, Zou J (2020) The seeding algorithms for spherical k-means clustering. J Global Optim 76(4):695–708. https://doi.org/10.1007/s10898-019-00779-w

    Article  MathSciNet  MATH  Google Scholar 

  52. Chang Y, Anagaw A, Chang L, Wang Y, Hsiao C, Lee W (2019) Ship detection based on YOLOv2 for SAR Imagery. Remote Sens 11(7). https://doi.org/10.3390/rs11070786

  53. Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2020) Object detection from scratch with Deep Supervision. IEEE Trans Pattern Anal Mach Intell 42(2):398–412. https://doi.org/10.1109/tpami.2019.2922181

    Article  Google Scholar 

  54. Ma F, Xu Y, Xu P (2021) Research on the Minimum size of received Signal Strength difference localization network. Int J Comput Intell Syst 14(1). https://doi.org/10.1007/s44196-021-00015-y

  55. Zhang Y, Zhou W, Wang Y, Xu L (2020) A real-time recognition method of static gesture based on DSSD. Multimedia Tools and Applications 79(25–26):17445–17461. https://doi.org/10.1007/s11042-020-08725-9

    Article  Google Scholar 

  56. Wang X, Wang J, Tang P, Liu W (2019) Weakly- and semi-supervised fast region-based CNN for object detection. J Comput Sci Technol 34(6):1269–1278. https://doi.org/10.1007/s11390-019-1975-z

    Article  Google Scholar 

  57. Zhang Y, Zhu S, Yu C, Zhao L (2022) Small-footprint keyword spotting based on gated Channel Transformation Sandglass residual neural network. Int J Pattern recognit Artif Intell 36(07). https://doi.org/10.1142/s0218001422580034

  58. Chen Y, Lai K, Liu D, Chen M (2022) TAGNet: triplet-attention graph networks for Hashtag recommendation. IEEE Trans Circuits Syst Video Technol 32(3):1148–1159. https://doi.org/10.1109/tcsvt.2021.3074599

    Article  Google Scholar 

  59. Zheng C, Zhang J, Hwang J, Huang B (2022) Double-branch Dehazing Network based on self-calibrated attentional convolution. Knowl Based Syst 240. https://doi.org/10.1016/j.knosys.2022.108148

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Huang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: The incorrect Figures 5, 6, and 7 were captured.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, L., Huang, X., Chao, Y. et al. An enhanced SSD with feature cross-reinforcement for small-object detection. Appl Intell 53, 19449–19465 (2023). https://doi.org/10.1007/s10489-023-04544-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04544-1

Keywords

Navigation