DM-YOLOX aerial object detection method with intensive attention mechanism

Li, Xiangyu; Wang, Fengping; Wang, Wei; Han, Yanjiang; Zhang, Jianyang

doi:10.1007/s11227-024-05944-x

DM-YOLOX aerial object detection method with intensive attention mechanism

Published: 21 February 2024

(2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Xiangyu Li¹,
Fengping Wang¹,
Wei Wang¹,
Yanjiang Han¹ &
…
Jianyang Zhang¹

183 Accesses
Explore all metrics

Abstract

In aerial image detection, difficulties in feature extraction and low detection accuracy arise due to background interference, occlusion, and the presence of multiple small objects. This paper proposes a DM-YOLOX aerial object target detection method with intensive attention mechanism. Firstly, the proposed approach incorporates coordinate attention (CA) and a dense connection method into the backbone network architecture, enabling adaptive channel weighting throughout the feature extraction process. This facilitates the enhancement of significant features while suppressing less relevant ones, thereby augmenting the network’s capacity to represent object features and ensuring retention and reinforcement of key features. Secondly, the multibranch extraction module (MBE) is incorporated into the feature fusion network to enhance the network’s ability in extracting multi-scale feature information from images with extensive coverage, thereby enhancing the detection accuracy and efficiency of small- and medium-sized objects in complex scenes. Finally, the utilization of SIoU instead of IoU as the bounding box loss function effectively addresses the issue of mismatch between real and predicted boxes, leading to accelerated network convergence and improved performance during model training. After training and testing on the VisDrone 2019 dataset, this method effectively detects small objects in complex environments. The DM-YOLOX model shows a significant improvement of 2.7% in mAP compared to the baseline network, while achieving an 8% increase in frames per second (FPS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

End-to-End Object Detection with Transformers

Data availability

The VisDrone dataset used in this paper are open, which can be downloaded from the Internet.

References

Veeranampalayam Sivakumar AN, Li J, Scott S, Psota E, Jhala JA, Luck JD, Shi Y (2020) Comparison of object detection and patch-based classification deep learning models on mid-to late-season weed detection in uav imagery. Remote Sens 12(13):2136
Article ADS Google Scholar
Kussul N, Lavreniuk M, Skakun S, Shelestov A (2017) Deep learning classification of land cover and crop types using remote sensing data. IEEE Geosci Remote Sens Lett 14(5):778–782
Article ADS Google Scholar
Ramachandran A, Sangaiah AK (2021) A review on object detection in unmanned aerial vehicle surveillance. Int J Cogn Comput Eng 2:215–228
Google Scholar
Lyu C, Zhang W, Huang H, Zhou Y, Wang Y, Liu Y, Zhang S, Chen K (2022) Rtmdet: an empirical study of designing real-time object detectors. arXiv preprint arXiv:2212.07784
Mboutayeb S, Majda A, Zenkouar K, Nikolov NS (2024) FCOSH: a novel single-head FCOS for faster object detection in autonomous-driving systems. Intell Syst Appl 21:200324
Google Scholar
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 9259–9266
Yang X, Yang J, Yan J, Zhang Y, Zhang T, Guo Z, Sun X, Fu K (2019) Scrdet: towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8232–8241
Wang J, Yang W, Guo H, Zhang R, Xia G-S (2021) Tiny object detection in aerial images. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp 3791–3798. IEEE
Guanglei M, Haibing P (2016) The application of ultrasonic sensor in the obstacle avoidance of quad-rotor UAV. In: 2016 IEEE Chinese Guidance, Navigation and Control Conference (CGNCC), pp 976–981. IEEE
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 1, pp 886–893. Ieee
Zhou H, Yuan Y, Shi C (2009) Object tracking using sift features and mean shift. Comput Vis Image Underst 113(3):345–352
Article Google Scholar
Ward IR, Laga H, Bennamoun M (2019) RGB-D image-based object detection: from traditional methods to deep learning techniques. RGB-D Image Analysis and Processing, 169–201
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer
Yao J, Qi J, Zhang J, Shao H, Yang J, Li X (2021) A real-time detection algorithm for kiwifruit defects based on yolov5. Electronics 10(14):1711
Article Google Scholar
Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, et al (2022) Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Li Y, Li S, Du H, Chen L, Zhang D, Li Y (2020) YOLO-ACN: Focusing on small target and occluded object detection. IEEE Access 8:227288–227303
Article Google Scholar
Zhu L, Xiong J, Xiong F, Hu H, Jiang Z (2023) Yolo-drone: Airborne real-time detection of dense small objects from high-altitude perspective. arXiv preprint arXiv:2304.06925
Zhu X, Lyu S, Wang X, Zhao Q (2021) Tph-yolov5: improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2778–2788
Baidya R, Jeong H (2022) Yolov5 with convmixer prediction heads for precise object detection in drone imagery. Sensors 22(21):8424
Article ADS PubMed PubMed Central Google Scholar
Xu Y, Fu M, Wang Q, Wang Y, Chen K, Xia G-S, Bai X (2020) Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans Pattern Anal Mach Intell 43(4):1452–1459
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp 6105–6114. PMLR
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp 516–520
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 658–666
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 12993–13000
Gevorgyan Z (2022) Siou loss: More powerful learning for bounding box regression. arXiv preprint arXiv:2205.12740
Du D, Zhu P, Wen L, Bian X, Ling H, Hu Q, Zheng J, Peng T, Wang X, Zhang Y, et al (2019) Visdrone-sot2019: the vision meets drone single object tracking challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops
Wen L, Zhu P, Hu Q, Pan J, Fan H, Ling H, Shah M, Meng F, Qiu H, Li H et al (2020) Visdrone-det2020: the vision meets drone object detection in image challenge results. Lect Notes Comput Sci 12538:692–712
Article Google Scholar
Xu S, Wang X, Lv W, Chang Q, Cui C, Deng K, Wang G, Dang Q, Wei S, Du Y et al (2022) Pp-yoloe: an evolved version of yolo. arXiv preprint arXiv:2203.16250

Download references

Funding

Youth Program of Shaanxi Province: 2022JQ-624. China University Industry Research and Innovation Fund: 2021ALA02002. The Higher Education Teaching Reform Research Project of China Textile Industry Association: 2021BKJGLX004. The Higher Education Research Project of Xi’an Polytechnic University: 20GJ05.

Author information

Authors and Affiliations

Xi’an Polytechnic University, Xi’an, 710600, Shaanxi, China
Xiangyu Li, Fengping Wang, Wei Wang, Yanjiang Han & Jianyang Zhang

Authors

Xiangyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Fengping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanjiang Han
View author publications
You can also search for this author in PubMed Google Scholar
Jianyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization was performed by XL and FW; methodology was done by XL and WW; software was developed by XL; validation was provided by FW; investigation was conducted by XL, YH, and JZ; writing—original draft preparation was revised by XL and FW; writing—review and editing were prpeared by WW and FW; visualization was presented by YH and JZ. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Fengping Wang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, X., Wang, F., Wang, W. et al. DM-YOLOX aerial object detection method with intensive attention mechanism. J Supercomput (2024). https://doi.org/10.1007/s11227-024-05944-x

Download citation

Accepted: 27 January 2024
Published: 21 February 2024
DOI: https://doi.org/10.1007/s11227-024-05944-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DM-YOLOX aerial object detection method with intensive attention mechanism

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

End-to-End Object Detection with Transformers

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation