Abstract
Target images are complex and diverse under the influence of scale, occlusion and appearance factors in real scenes, and they affect the performance of target detection algorithms. They also make the existing target detection algorithms suffer from the following problems. On the one hand, the neurons in the target detection algorithm architecture cannot learn the complex interaction and semantic features inside the target image. On the other hand, the feature expression of different target images is insufficient and the channel reduction leads to the loss of position information and other problems. herefore, a multi-layer attention mechanism of considering both node and semantic level attention in the model architecture is proposed. In this method, the fusion of neighbors and semantic information is weighted, and node representations is learned under a hierarchical aggregation manner.Just because of this, it can improve the effectiveness and interpretability of the model, and solve the problem of complex interaction and rich semantic feature acquisition within images. Furthermore, we propose an adaptive feature fusion network which can adaptively filter the useless information of other layers and retain the feature information that is beneficial to target recognition. A feature enhancement module, is introduced to enhance the identifiability of the top-level target features of the feature network, and which can alleviate the problem of loss of target position. Finally, the extensive tests using PASCAL VOC and MSCOCO datasets, the experimental result shows that our method not only has the best recognition performance, but also has better stability and robustness.
Similar content being viewed by others
Data availability
The data used to support the findings of this study are included within the paper.
References
Zhang T, Peng Z, Wu H (2021) Infrared small target detection via self-regularized weighted sparse model. Neurocomputing 420:124–148
Zhao M, Li W, Li L (2021) Three-order tensor creation and tucker decomposition for infrared small-target detection. IEEE Trans Geosci Remote Sens 60:1–16
Zhang R, Xu L, Yu Z (2021) Deep-irtarget: an automatic target detector in infrared imagery using dual-domain feature extraction and allocation. IEEE Trans Multimed 24:1–15
Chen X, Su N, Huang Y (2021) False-alarm-controllable radar detection for marine target based on multi features fusion via CNNs. IEEE Sens J 21(7):9099–9111
Wen L, Ding J, Loffeld O (2021) Video SAR moving target detection using dual faster R-CNN. IEEE J Sel Top Appl Earth Observations Remote Sens 14:2984–2994
Chamseddine M, Rambach J, Stricker D (2021) Ghost target detection in 3d radar data using point cloud based deep neural network. 2020 25th International conference on pattern recognition. pp. 10398–10403.
Zakria Z, Deng J, Kumar R (2022) Multi scale and direction target detecting in remote sensing images via modified YOLO-v4. IEEE J Sel Top Appl Earth Observations Remote Sens 15:1039–1048
Wang X, Han T X, Yan S (2009) An HOG-LBP human detector with partial occlusion handling. IEEE 12th international conference on computer vision. pp. 32–39
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. pp. 1–5, 2001.
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154
Zitnick C L, Dollár P (2014) Edge boxes: Locating object proposals from edges. European conference on computer vision. pp. 391–405
Yorozu Y, Hirano M, Oka K (2014) Binarized normed gradients for objectness estimation. Comput Vis Pattern Recognit 2014:3286–3293
Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202
Girshick R, Donahue J, Darrell T (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition pp. 580–587.
Zheng Z, Wang P, Liu W (2020) Distance-IoU loss: Faster and better learning for bounding box regression. Proceedings of the AAAI conference on artificial intelligence, pp. 12993–13000.
Zheng Z, Wang P, Ren D (2021) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans Cybern. 52:1–14
Wang C Y, Bochkovskiy A, Liao H Y M (2021) Scaled-yolov4: Scaling cross stage partial network. Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp. 13029–13038.
Sun P, Zhang R, Jiang Y (2021) Sparse r-cnn: End-to-end object detection with learnable proposals. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 14454–14463
Tu Z, Guo Z, Xie W (2017) Fusing disparate object signatures for salient object detection in video. Pattern Recogn 72:285–299
Yi Z, Yao D, Li G (2022) Detection and localization for lake floating objects based on CA-faster R-CNN. Multimed Tools Appl 81(12):17263–17281
Jing H, Cheng Y, Wu H (2022) Radar Target Detection with Multi-task Learning in Heterogeneous Environment. IEEE Geosci Remote Sens Lett 19:1–5
Yin G, Yu M, Wang M (2022) Research on highway vehicle detection based on faster R-CNN and domain adaptation. Appl Intell 52(4):3483–3498
Tu Z, Xie W, Qin Q (2018) Multi-stream CNN: Learning representations based on human-related regions for action recognition. Pattern Recogn 79:32–43
Lu X, Ji J, Xing Z (2021) Attention and feature fusion SSD for remote sensing object detection. IEEE Trans Instrum Meas 70:1–9
Zhao X, Zhang J, Tian J (2021) Multiscale object detection in high-resolution remote sensing images via rotation invariant deep features driven by channel attention. Int J Remote Sens 42(15):5764–5783
Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. arXiv preprint arXiv:1803.02155
Dai X, Chen Y, Yang J (2021) Dynamic detr: End-to-end object detection with dynamic attention. Proceedings of the IEEE/CVF international conference on computer vision. pp. 2988–2997.
Liu W, Anguelov D, Erhan D (2016) SSD: Single shot multibox detector. European conference on computer vision. pp. 21–37.
Zheng W, Tang W, Jiang L (2021) SE-SSD: Self-ensembling single-stage object detector from point cloud. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 14494–14503
Lin T Y, Dollár P, Girshick R (Feature 2017pyramid networks for object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125, 2017.
Liu Y, Zhang XY, Bian JW (2021) SAMNet: Stereoscopically attentive multi-scale network for lightweight salient object detection. IEEE Trans Image Process 30:3804–3814
Zhu X, Lyu S, Wang X (2021) TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. Proceedings of the IEEE/CVF international conference on computer vision, pp. 2778–2788, 2021.
An FP, Liu J, Bai L (2022) Object recognition algorithm based on optimized nonlinear activation function-global convolutional neural network. Vis Comput 38(2):541–553
Everingham M, Van Gool L, Williams CKI (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Lin T Y, Maire M, Belongie S (2014) Microsoft coco: Common objects in context. European conference on computer vision. pp. 740–755
Deshpande A, Estrela VV, Patavardhan P (2021) The DCT-CNN-ResNet50 architecture to classify brain tumors with super-resolution, convolutional neural network, and the ResNet50. Neurosci Inform 1(4):100–113
Jais IKM, Ismail AR, Nisa SQ (2019) Adam optimization algorithm for wide and deep neural network. Knowl Eng Data Sci 2(1):41–46
Wang X, Cai Z, Gao D (2019) Towards universal object detection by domain attention. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7289–7298.
Fan Q, Zhuo W, Tang C K (2020) Few-shot object detection with attention-RPN and multi-relation detector. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4013–4022
Wang J, Chen K, Yang S (2019) Region proposal by guided anchoring. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2965–2974.
Zhang S, Chi C, Yao Y (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp.9759–9768
Zhao P, Qu Z, Bu Y (2021) PolarDet: a fast, more precise detector for rotated target in aerial images. Int J Remote Sens 42(5):5831–5861
Wu YH, Liu Y, Zhang L (2022) EDN: Salient object detection via extremely-downsampled network. IEEE Trans Image Process 31:3125–3136
Acknowledgements
National Natural Science Foundation of China (No. 61701188), Natural Science Foundation of Jiangsu Province (No. BK20201479), China Postdoctoral Science Foundation (No. 2019M650512, No. 2021M692400), and Hebei IoT Monitoring Engineering Technology Research Center funded project (No. IOT202004).
Funding
National Natural Science Foundation of China, 61701188, Fengping An.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
An, F., Wang, J. Target detection algorithm based on multilayer attention mechanism-adaptive feature fusion network. Int. J. Mach. Learn. & Cyber. 14, 2685–2695 (2023). https://doi.org/10.1007/s13042-023-01791-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01791-z