Abstract
Due to the limited feature information possessed by small objects in images, it is difficult for a single-shot multibox detector (SSD) to quickly notice the important regions of these small image objects. We propose an enhanced SSD based on feature cross-reinforcement (FCR-SSD). For shallow sampling, an improved group shuffling-efficient channel attention (GS-ECA) mechanism is used to make the model focus on the object areas rather than the background. Then, an FCR module allows the multiscale information from the shallow layer to be passed to the subsequent layer and fused to generate an enhanced feature map, which improves the utilization of the context information associated with small objects. We develop an adaptive algorithm for calculating positive and negative candidate box selection thresholds to select positive and negative samples, determine the intersection over union (IOU) thresholds of candidate boxes and ground-truth boxes, and adaptively determine the threshold for each ground-truth box. The proposed FCR-SSD algorithm achieves 79.6% mean average precision (mAP) for the PASCAL VOC 2007 dataset and 30.1% mAP for the MS COCO dataset at 34.2 frames per second (FPS) when run on an RTX 3080Ti GPU. The experimental results show that the FCR-SSD model yields high accuracy and a good detection speed in small-target detection tasks.
Similar content being viewed by others
Change history
06 May 2023
A Correction to this paper has been published: https://doi.org/10.1007/s10489-023-04640-2
References
Wei J, He J, Zhou Y, Chen K, Tang Z, Xiong Z (2020) Enhanced object detection with deep convolutional neural networks for Advanced driving assistance. IEEE Trans Intell Transp Syst 21(4):1572–1583
Guo G, Wang H, Yan Y, Zheng J, Li B (2020) A fast face detection method via convolutional neural network. Neurocomputing 395:128–137
Ferrag MA, Maglaras L, Moschoyiannis S, Janicke H (2020) Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J Inform Secur Appl 50:1–19
Qiu T, Wen C, Xie K, Wen F, Sheng G, Tang X (2019) Efficient medical image enhancement based on CNN-FBB model. IET Image Proc 13(10):1736–1744
Retnamony J, Muniasamy S, Stanley B (2022) Enhanced global and local face feature extraction for effective recognition of facial emotions. Concurrency and Computation-Practice & Experience 34(5). https://doi.org/10.1002/cpe.6701
Owczarek M (2020) The impact and importance of fabric image preprocessing for the new method of individual inter-thread pores detection. Autex Res J 20(3):250–262
Zhu Y, Zhang F, Li L, Lin Y, Zhang Z, Shi L, Qin T (2021) Research on classification model of Panax notoginseng taproots based on machine Vision Feature Fusion. Sensors 21(23). https://doi.org/10.3390/s2123794
Guo S, Liu F, Yuan X, Zou C, Chen L, Shen T (2021) HSPOG: an optimized target Recognition Method based on histogram of spatial pyramid oriented gradients. Tsinghua Sci Technol 26(4):475–483
Perez-Benito F, Signol F, Perez-Cortes J, Pollan M, Perez-Gomez B, Salas-Trejo D, Llobet R (2019) Global parenchymal texture features based on histograms of oriented gradients improve cancer development risk estimation from healthy breasts. Comput Methods Programs Biomed 177:123–132. https://doi.org/10.1016/j.cmpb.2019.05.022
Safdari M, Moallem P, Satari M (2019) SIFT detector boosted by adaptive contrast threshold to improve matching robustness of Remote sensing panchromatic images. Ieee J Sel Top Appl Earth Observations Remote Sens 12(2):675–684. https://doi.org/10.1109/jstars.2019.2892360
Doyle L, Mould D (2019) Augmenting photographs with textures using the Laplacian pyramid. Visual Comput 35(10):1489–1500
Fu Z, Zhao Y, Xu Y, Xu L, Xu J (2020) Gradient structural similarity based gradient filtering for multi-modal image fusion. Inform Fusion 53:251–268. https://doi.org/10.1016/j.inffus.2019.06.025
Xu J, Liu Z, Hou Y, Zhen X, Shao L, Cheng M (2021) Pixel-level non-local image smoothing with objective evaluation. IEEE Trans Multimedia 23:4065–4078
Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaria J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 8(1):1–74
Zhou D (2018) Deep distributed convolutional neural networks: universality. Anal Appl 16(6):895–919
Mu R, Zeng X (2019) A review of deep learning research. KSII Trans Internet Inform Syst (TIIS) 13(4):1738–1764. https://doi.org/10.3837/tiis.2019.04.001
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Wang J, Wang W, Gao W (2018) Multiscale Deep alternative neural network for large-scale video classification. IEEE Trans Multimedia 20(10):2578–2592. https://doi.org/10.1109/tmm.2018.2855081
Wang D, Li Y, Ma L, Bai Z, Chan J (2019) Going deeper with densely connected convolutional neural networks for Multispectral Pansharpening. Remote Sens 11(22). https://doi.org/10.3390/rs11222608
Ha V, Ren J, Xu X, Liao W, Zhao S, Ren J, Yan G (2020) Optimized highway deep learning network for fast single image super-resolution reconstruction. J Real-Time Image Proc 17(6):1961–1970. https://doi.org/10.1007/s11554-020-00973-0
Lu Y, Dong L, Zhang T, Xu W (2020) A robust detection algorithm for Infrared Maritime Small and Dim targets. Sensors 20(4):1–19
Li Y, Zhang D, Lee D (2019) IIRNet: a lightweight deep neural network using intensely inverted residuals for image recognition. Image Vis Comput 92:1–8
Shelhamer E, Long J, Darrell T (2017) Fully Convolutional Networks for Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a Comprehensive Review. Neural Comput 29(9):2352–2449
Zhu K, Wang R, Zhao Q, Cheng J, Tao D (2020) A cuboid CNN Model with an attention mechanism for Skeleton-Based action recognition. IEEE Trans Multimedia 22(11):2977–2989
Omar W, Oh Y, Chung J, Lee I (2021) Aerial dataset integration for vehicle detection based on YOLOv4. Korean J Remote Sens 37(4):747–761. https://doi.org/10.7780/kjrs.2021.37.4.6
Vinyals O, Toshev A, Bengio S, Erhan D (2017) Show and tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge. IEEE Trans Pattern Anal Mach Intell 39(4):652–663
Xi R, Hou J, Lou W (2020) Potato bud detection with improved faster R-CNN. Trans Asabe 63(3):557–569. https://doi.org/10.13031/trans.13628
Ren P, Wang L, Fang W, Song S, Djahel S (2020) A novel squeeze YOLO-based real-time people counting approach. Int J Bio-Inspired Comput 16(2):94–101. https://doi.org/10.1504/ijbic.2020.109674
Biswas D, Su H, Wang C, Stevanovic A, Wang W (2019) An automatic traffic density estimation using single shot detection (SSD) and MobileNet-SSD. Phys Chem Earth 110:176–184. https://doi.org/10.1016/j.pce.2018.12.001
Cao J, Song C, Song S, Peng S, Wang D, Shao Y, Xiao F (2020) Vehicle detection algorithm for Smart Car based on improved SSD model. Sensors 20(16):1–21
Cheng Y, Liu W, Xing W (2021) Weighted feature fusion and attention mechanism for object detection. J Electron Imaging 30(2):1–12
Zhou S, Qiu J (2021) Enhanced SSD with interactive multi-scale attention features for object detection. Multimedia Tools and Applications 80(8):11539–11556. https://doi.org/10.1007/s11042-020-10191-2
Cai Z, Vasconcelos N (2021) Cascade R-CNN: high quality object detection and Instance Segmentation. IEEE Trans Pattern Anal Mach Intell 43(5):1483–1498
Lin T, Goyal P, Girshick R, He K, Dollar P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327
Zhang K, Cui L, Yin Y (2020) A multivariate grey incidence model for different scale data based on spatial pyramid pooling. J Syst Eng Electron 31(4):770–779. https://doi.org/10.23919/jsee.2020.000052
Li T, Yu Y, Huang C, Yang J, Zhong Y, Hao Y (2022) Method for predicting cutter remaining life based on multi-scale cyclic convolutional network. Int J Distrib Sens Netw 18(5). https://doi.org/10.1177/15501329221102077
Chen S, Tan X, Wang B, Lu H, Hu X, Fu Y (2020) Reverse attention-based residual network for salient object detection. IEEE Trans Image Process 29:3763–3776
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023
Xue H, Sun M, Liang Y (2022) ECANet: explicit cyclic attention-based network for video saliency prediction. Neurocomputing 468:233–244
Khan R, Khattak H, Wong W, AlSalman H, Mosleh M, Rahman S (2021) Intelligent Malaysian Sign Language Translation System Using Convolutional-Based Attention Module with Residual Network. Computational Intelligence and Neuroscience, 2021. doi:https://doi.org/10.1155/2021/9023010
Lee H, Kwon H (2017) Going deeper with contextual CNN for Hyperspectral Image classification. IEEE Trans Image Process 26(10):4843–4855
Poernomo A, Kang D (2018) Biased dropout and Crossmap Dropout: learning towards effective dropout regularization in convolutional neural network. Neural Netw 104:60–67. https://doi.org/10.1016/j.neunet.2018.03.016
Abu Al-Haija Q (2022) Leveraging ShuffleNet transfer learning to enhance handwritten character recognition. Gene Expr Patterns 45. https://doi.org/10.1016/j.gep.2022.119263
Wang J, Yu J, He Z (2022) DECA: a novel multi-scale efficient channel attention module for object detection in real-life fire images. Appl Intell 52(2):1362–1375. https://doi.org/10.1007/s10489-021-02496-y
Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11
Li S, Sultonov F, Tursunboev J, Park J, Yun S, Kang J (2022) Ghostformer: a GhostNet-Based two-stage transformer for small object detection. Sensors 22(18). https://doi.org/10.3390/s22186939
Bai L, Zhao Y, Huang X (2018) A CNN Accelerator on FPGA using depthwise separable convolution. Ieee Trans Circuits Syst Ii-Express Briefs 65(10):1415–1419
Wu K, Bai C, Wang D, Liu Z, Huang T, Zheng H (2021) Improved object detection algorithm of YOLOv3 Remote sensing image. Ieee Access 9:113889–113900. https://doi.org/10.1109/access.2021.3103522
Yarotsky D (2017) Error bounds for approximations with deep ReLU networks. Neural Netw 94:103–114
Li M, Xu D, Zhang D, Zou J (2020) The seeding algorithms for spherical k-means clustering. J Global Optim 76(4):695–708. https://doi.org/10.1007/s10898-019-00779-w
Chang Y, Anagaw A, Chang L, Wang Y, Hsiao C, Lee W (2019) Ship detection based on YOLOv2 for SAR Imagery. Remote Sens 11(7). https://doi.org/10.3390/rs11070786
Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2020) Object detection from scratch with Deep Supervision. IEEE Trans Pattern Anal Mach Intell 42(2):398–412. https://doi.org/10.1109/tpami.2019.2922181
Ma F, Xu Y, Xu P (2021) Research on the Minimum size of received Signal Strength difference localization network. Int J Comput Intell Syst 14(1). https://doi.org/10.1007/s44196-021-00015-y
Zhang Y, Zhou W, Wang Y, Xu L (2020) A real-time recognition method of static gesture based on DSSD. Multimedia Tools and Applications 79(25–26):17445–17461. https://doi.org/10.1007/s11042-020-08725-9
Wang X, Wang J, Tang P, Liu W (2019) Weakly- and semi-supervised fast region-based CNN for object detection. J Comput Sci Technol 34(6):1269–1278. https://doi.org/10.1007/s11390-019-1975-z
Zhang Y, Zhu S, Yu C, Zhao L (2022) Small-footprint keyword spotting based on gated Channel Transformation Sandglass residual neural network. Int J Pattern recognit Artif Intell 36(07). https://doi.org/10.1142/s0218001422580034
Chen Y, Lai K, Liu D, Chen M (2022) TAGNet: triplet-attention graph networks for Hashtag recommendation. IEEE Trans Circuits Syst Video Technol 32(3):1148–1159. https://doi.org/10.1109/tcsvt.2021.3074599
Zheng C, Zhang J, Hwang J, Huang B (2022) Double-branch Dehazing Network based on self-calibrated attentional convolution. Knowl Based Syst 240. https://doi.org/10.1016/j.knosys.2022.108148
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: The incorrect Figures 5, 6, and 7 were captured.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gong, L., Huang, X., Chao, Y. et al. An enhanced SSD with feature cross-reinforcement for small-object detection. Appl Intell 53, 19449–19465 (2023). https://doi.org/10.1007/s10489-023-04544-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04544-1