Skip to main content
Log in

DM-YOLOX aerial object detection method with intensive attention mechanism

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In aerial image detection, difficulties in feature extraction and low detection accuracy arise due to background interference, occlusion, and the presence of multiple small objects. This paper proposes a DM-YOLOX aerial object target detection method with intensive attention mechanism. Firstly, the proposed approach incorporates coordinate attention (CA) and a dense connection method into the backbone network architecture, enabling adaptive channel weighting throughout the feature extraction process. This facilitates the enhancement of significant features while suppressing less relevant ones, thereby augmenting the network’s capacity to represent object features and ensuring retention and reinforcement of key features. Secondly, the multibranch extraction module (MBE) is incorporated into the feature fusion network to enhance the network’s ability in extracting multi-scale feature information from images with extensive coverage, thereby enhancing the detection accuracy and efficiency of small- and medium-sized objects in complex scenes. Finally, the utilization of SIoU instead of IoU as the bounding box loss function effectively addresses the issue of mismatch between real and predicted boxes, leading to accelerated network convergence and improved performance during model training. After training and testing on the VisDrone 2019 dataset, this method effectively detects small objects in complex environments. The DM-YOLOX model shows a significant improvement of 2.7% in mAP compared to the baseline network, while achieving an 8% increase in frames per second (FPS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The VisDrone dataset used in this paper are open, which can be downloaded from the Internet.

References

  1. Veeranampalayam Sivakumar AN, Li J, Scott S, Psota E, Jhala JA, Luck JD, Shi Y (2020) Comparison of object detection and patch-based classification deep learning models on mid-to late-season weed detection in uav imagery. Remote Sens 12(13):2136

    Article  ADS  Google Scholar 

  2. Kussul N, Lavreniuk M, Skakun S, Shelestov A (2017) Deep learning classification of land cover and crop types using remote sensing data. IEEE Geosci Remote Sens Lett 14(5):778–782

    Article  ADS  Google Scholar 

  3. Ramachandran A, Sangaiah AK (2021) A review on object detection in unmanned aerial vehicle surveillance. Int J Cogn Comput Eng 2:215–228

    Google Scholar 

  4. Lyu C, Zhang W, Huang H, Zhou Y, Wang Y, Liu Y, Zhang S, Chen K (2022) Rtmdet: an empirical study of designing real-time object detectors. arXiv preprint arXiv:2212.07784

  5. Mboutayeb S, Majda A, Zenkouar K, Nikolov NS (2024) FCOSH: a novel single-head FCOS for faster object detection in autonomous-driving systems. Intell Syst Appl 21:200324

    Google Scholar 

  6. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 9259–9266

  7. Yang X, Yang J, Yan J, Zhang Y, Zhang T, Guo Z, Sun X, Fu K (2019) Scrdet: towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8232–8241

  8. Wang J, Yang W, Guo H, Zhang R, Xia G-S (2021) Tiny object detection in aerial images. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp 3791–3798. IEEE

  9. Guanglei M, Haibing P (2016) The application of ultrasonic sensor in the obstacle avoidance of quad-rotor UAV. In: 2016 IEEE Chinese Guidance, Navigation and Control Conference (CGNCC), pp 976–981. IEEE

  10. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 1, pp 886–893. Ieee

  11. Zhou H, Yuan Y, Shi C (2009) Object tracking using sift features and mean shift. Comput Vis Image Underst 113(3):345–352

    Article  Google Scholar 

  12. Ward IR, Laga H, Bennamoun M (2019) RGB-D image-based object detection: from traditional methods to deep learning techniques. RGB-D Image Analysis and Processing, 169–201

  13. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587

  14. Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1440–1448

  15. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28

  16. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37. Springer

  17. Yao J, Qi J, Zhang J, Shao H, Yang J, Li X (2021) A real-time detection algorithm for kiwifruit defects based on yolov5. Electronics 10(14):1711

    Article  Google Scholar 

  18. Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, et al (2022) Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976

  19. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430

  20. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7263–7271

  21. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

  22. Li Y, Li S, Du H, Chen L, Zhang D, Li Y (2020) YOLO-ACN: Focusing on small target and occluded object detection. IEEE Access 8:227288–227303

    Article  Google Scholar 

  23. Zhu L, Xiong J, Xiong F, Hu H, Jiang Z (2023) Yolo-drone: Airborne real-time detection of dense small objects from high-altitude perspective. arXiv preprint arXiv:2304.06925

  24. Zhu X, Lyu S, Wang X, Zhao Q (2021) Tph-yolov5: improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2778–2788

  25. Baidya R, Jeong H (2022) Yolov5 with convmixer prediction heads for precise object detection in drone imagery. Sensors 22(21):8424

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  26. Xu Y, Fu M, Wang Q, Wang Y, Chen K, Xia G-S, Bai X (2020) Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans Pattern Anal Mach Intell 43(4):1452–1459

    Article  Google Scholar 

  27. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  28. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  29. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708

  30. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  31. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp 6105–6114. PMLR

  32. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022

  33. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13713–13722

  34. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768

  35. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19

  36. Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp 516–520

  37. Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 658–666

  38. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 12993–13000

  39. Gevorgyan Z (2022) Siou loss: More powerful learning for bounding box regression. arXiv preprint arXiv:2205.12740

  40. Du D, Zhu P, Wen L, Bian X, Ling H, Hu Q, Zheng J, Peng T, Wang X, Zhang Y, et al (2019) Visdrone-sot2019: the vision meets drone single object tracking challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops

  41. Wen L, Zhu P, Hu Q, Pan J, Fan H, Ling H, Shah M, Meng F, Qiu H, Li H et al (2020) Visdrone-det2020: the vision meets drone object detection in image challenge results. Lect Notes Comput Sci 12538:692–712

    Article  Google Scholar 

  42. Xu S, Wang X, Lv W, Chang Q, Cui C, Deng K, Wang G, Dang Q, Wei S, Du Y et al (2022) Pp-yoloe: an evolved version of yolo. arXiv preprint arXiv:2203.16250

Download references

Funding

Youth Program of Shaanxi Province: 2022JQ-624. China University Industry Research and Innovation Fund: 2021ALA02002. The Higher Education Teaching Reform Research Project of China Textile Industry Association: 2021BKJGLX004. The Higher Education Research Project of Xi’an Polytechnic University: 20GJ05.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization was performed by XL and FW; methodology was done by XL and WW; software was developed by XL; validation was provided by FW; investigation was conducted by XL, YH, and JZ; writing—original draft preparation was revised by XL and FW; writing—review and editing were prpeared by WW and FW; visualization was presented by YH and JZ. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Fengping Wang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Wang, F., Wang, W. et al. DM-YOLOX aerial object detection method with intensive attention mechanism. J Supercomput (2024). https://doi.org/10.1007/s11227-024-05944-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-05944-x

Keywords

Navigation