Skip to main content
Log in

Target detection algorithm based on multilayer attention mechanism-adaptive feature fusion network

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Target images are complex and diverse under the influence of scale, occlusion and appearance factors in real scenes, and they affect the performance of target detection algorithms. They also make the existing target detection algorithms suffer from the following problems. On the one hand, the neurons in the target detection algorithm architecture cannot learn the complex interaction and semantic features inside the target image. On the other hand, the feature expression of different target images is insufficient and the channel reduction leads to the loss of position information and other problems. herefore, a multi-layer attention mechanism of considering both node and semantic level attention in the model architecture is proposed. In this method, the fusion of neighbors and semantic information is weighted, and node representations is learned under a hierarchical aggregation manner.Just because of this, it can improve the effectiveness and interpretability of the model, and solve the problem of complex interaction and rich semantic feature acquisition within images. Furthermore, we propose an adaptive feature fusion network which can adaptively filter the useless information of other layers and retain the feature information that is beneficial to target recognition. A feature enhancement module, is introduced to enhance the identifiability of the top-level target features of the feature network, and which can alleviate the problem of loss of target position. Finally, the extensive tests using PASCAL VOC and MSCOCO datasets, the experimental result shows that our method not only has the best recognition performance, but also has better stability and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The data used to support the findings of this study are included within the paper.

References

  1. Zhang T, Peng Z, Wu H (2021) Infrared small target detection via self-regularized weighted sparse model. Neurocomputing 420:124–148

    Article  Google Scholar 

  2. Zhao M, Li W, Li L (2021) Three-order tensor creation and tucker decomposition for infrared small-target detection. IEEE Trans Geosci Remote Sens 60:1–16

    Article  Google Scholar 

  3. Zhang R, Xu L, Yu Z (2021) Deep-irtarget: an automatic target detector in infrared imagery using dual-domain feature extraction and allocation. IEEE Trans Multimed 24:1–15

    Google Scholar 

  4. Chen X, Su N, Huang Y (2021) False-alarm-controllable radar detection for marine target based on multi features fusion via CNNs. IEEE Sens J 21(7):9099–9111

    Article  Google Scholar 

  5. Wen L, Ding J, Loffeld O (2021) Video SAR moving target detection using dual faster R-CNN. IEEE J Sel Top Appl Earth Observations Remote Sens 14:2984–2994

    Article  Google Scholar 

  6. Chamseddine M, Rambach J, Stricker D (2021) Ghost target detection in 3d radar data using point cloud based deep neural network. 2020 25th International conference on pattern recognition. pp. 10398–10403.

  7. Zakria Z, Deng J, Kumar R (2022) Multi scale and direction target detecting in remote sensing images via modified YOLO-v4. IEEE J Sel Top Appl Earth Observations Remote Sens 15:1039–1048

    Article  Google Scholar 

  8. Wang X, Han T X, Yan S (2009) An HOG-LBP human detector with partial occlusion handling. IEEE 12th international conference on computer vision. pp. 32–39

  9. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. pp. 1–5, 2001.

  10. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154

    Article  Google Scholar 

  11. Zitnick C L, Dollár P (2014) Edge boxes: Locating object proposals from edges. European conference on computer vision. pp. 391–405

  12. Yorozu Y, Hirano M, Oka K (2014) Binarized normed gradients for objectness estimation. Comput Vis Pattern Recognit 2014:3286–3293

    Google Scholar 

  13. Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202

    Article  Google Scholar 

  14. Girshick R, Donahue J, Darrell T (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition pp. 580–587.

  15. Zheng Z, Wang P, Liu W (2020) Distance-IoU loss: Faster and better learning for bounding box regression. Proceedings of the AAAI conference on artificial intelligence, pp. 12993–13000.

  16. Zheng Z, Wang P, Ren D (2021) Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans Cybern. 52:1–14

    Google Scholar 

  17. Wang C Y, Bochkovskiy A, Liao H Y M (2021) Scaled-yolov4: Scaling cross stage partial network. Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp. 13029–13038.

  18. Sun P, Zhang R, Jiang Y (2021) Sparse r-cnn: End-to-end object detection with learnable proposals. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 14454–14463

  19. Tu Z, Guo Z, Xie W (2017) Fusing disparate object signatures for salient object detection in video. Pattern Recogn 72:285–299

    Article  Google Scholar 

  20. Yi Z, Yao D, Li G (2022) Detection and localization for lake floating objects based on CA-faster R-CNN. Multimed Tools Appl 81(12):17263–17281

    Article  Google Scholar 

  21. Jing H, Cheng Y, Wu H (2022) Radar Target Detection with Multi-task Learning in Heterogeneous Environment. IEEE Geosci Remote Sens Lett 19:1–5

    Google Scholar 

  22. Yin G, Yu M, Wang M (2022) Research on highway vehicle detection based on faster R-CNN and domain adaptation. Appl Intell 52(4):3483–3498

    Article  Google Scholar 

  23. Tu Z, Xie W, Qin Q (2018) Multi-stream CNN: Learning representations based on human-related regions for action recognition. Pattern Recogn 79:32–43

    Article  Google Scholar 

  24. Lu X, Ji J, Xing Z (2021) Attention and feature fusion SSD for remote sensing object detection. IEEE Trans Instrum Meas 70:1–9

    Article  Google Scholar 

  25. Zhao X, Zhang J, Tian J (2021) Multiscale object detection in high-resolution remote sensing images via rotation invariant deep features driven by channel attention. Int J Remote Sens 42(15):5764–5783

    Article  Google Scholar 

  26. Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. arXiv preprint arXiv:1803.02155

  27. Dai X, Chen Y, Yang J (2021) Dynamic detr: End-to-end object detection with dynamic attention. Proceedings of the IEEE/CVF international conference on computer vision. pp. 2988–2997.

  28. Liu W, Anguelov D, Erhan D (2016) SSD: Single shot multibox detector. European conference on computer vision. pp. 21–37.

  29. Zheng W, Tang W, Jiang L (2021) SE-SSD: Self-ensembling single-stage object detector from point cloud. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 14494–14503

  30. Lin T Y, Dollár P, Girshick R (Feature 2017pyramid networks for object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125, 2017.

  31. Liu Y, Zhang XY, Bian JW (2021) SAMNet: Stereoscopically attentive multi-scale network for lightweight salient object detection. IEEE Trans Image Process 30:3804–3814

    Article  Google Scholar 

  32. Zhu X, Lyu S, Wang X (2021) TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. Proceedings of the IEEE/CVF international conference on computer vision, pp. 2778–2788, 2021.

  33. An FP, Liu J, Bai L (2022) Object recognition algorithm based on optimized nonlinear activation function-global convolutional neural network. Vis Comput 38(2):541–553

    Article  Google Scholar 

  34. Everingham M, Van Gool L, Williams CKI (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338

    Article  Google Scholar 

  35. Lin T Y, Maire M, Belongie S (2014) Microsoft coco: Common objects in context. European conference on computer vision. pp. 740–755

  36. Deshpande A, Estrela VV, Patavardhan P (2021) The DCT-CNN-ResNet50 architecture to classify brain tumors with super-resolution, convolutional neural network, and the ResNet50. Neurosci Inform 1(4):100–113

    Article  Google Scholar 

  37. Jais IKM, Ismail AR, Nisa SQ (2019) Adam optimization algorithm for wide and deep neural network. Knowl Eng Data Sci 2(1):41–46

    Article  Google Scholar 

  38. Wang X, Cai Z, Gao D (2019) Towards universal object detection by domain attention. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7289–7298.

  39. Fan Q, Zhuo W, Tang C K (2020) Few-shot object detection with attention-RPN and multi-relation detector. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4013–4022

  40. Wang J, Chen K, Yang S (2019) Region proposal by guided anchoring. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2965–2974.

  41. Zhang S, Chi C, Yao Y (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp.9759–9768

  42. Zhao P, Qu Z, Bu Y (2021) PolarDet: a fast, more precise detector for rotated target in aerial images. Int J Remote Sens 42(5):5831–5861

    Article  Google Scholar 

  43. Wu YH, Liu Y, Zhang L (2022) EDN: Salient object detection via extremely-downsampled network. IEEE Trans Image Process 31:3125–3136

    Article  Google Scholar 

Download references

Acknowledgements

National Natural Science Foundation of China (No. 61701188), Natural Science Foundation of Jiangsu Province (No. BK20201479), China Postdoctoral Science Foundation (No. 2019M650512, No. 2021M692400), and Hebei IoT Monitoring Engineering Technology Research Center funded project (No. IOT202004).

Funding

National Natural Science Foundation of China, 61701188, Fengping An.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fengping An.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

An, F., Wang, J. Target detection algorithm based on multilayer attention mechanism-adaptive feature fusion network. Int. J. Mach. Learn. & Cyber. 14, 2685–2695 (2023). https://doi.org/10.1007/s13042-023-01791-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01791-z

Keywords

Navigation