Skip to main content
Log in

DENS-YOLOv6: a small object detection model for garbage detection on water surface

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The study of garbage detection on water surface is of great significance for the development of water surface garbage monitoring and automated water surface garbage salvage. However, in water surface garbage scenes, the proportion of water background is relatively large, while the proportion of detection objects is relatively small. Moreover, the objects are easily affected by noise interference such as lighting, water waves, and reflections, which makes it difficult to extract object features and affects detection accuracy. In this paper, we propose a Detail Enhancement Noise Suppression YOLOv6 (DENS-YOLOv6) detection algorithm based on YOLOv6. Firstly, to better capture the detailed feature information of small objects, we design a Detail Information Enhancement Module (DIEM) based on atrous convolution. Secondly, to suppress noise interference on small objects, we develop an Adaptive Noise Suppression Module (ANSM). Finally, in order to improve the stability and convergence speed of the model training, we employ a regression loss function based on the Normalized Wasserstein Distance(NWD) metric. Experiments were conducted on the Flow+ dataset with a large number of small objects and the publicly available Pascal VOC2007 dataset. The mAP\(_S\) indicators reached 40.6% and 11.4%, respectively. Compared with other models, DENS-YOLOv6 achieved the highest small object detection accuracy

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability and access

Data is available on request from the authors.

References

  1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30

  2. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229

  3. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: deformable transformers for end-to-end object detection. arXiv:2010.04159

  4. Zhang H, Li F, Liu S, Zhang L, Su H, Zhu J, Ni LM, Shum H-Y (2022) Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv:2203.03605

  5. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  6. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  7. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28

  8. Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162

  9. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934

  10. Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W et al (2022) Yolov6: a single-stage object detection framework for industrial applications. arXiv:2209.02976

  11. Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7464–7475

  12. Wang J, Xu C, Yang W, Yu L (2021) A normalized gaussian wasserstein distance for tiny object detection. arXiv:2110.13389

  13. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88:303–338

    Article  Google Scholar 

  14. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst 27

  15. Bai Y, Zhang Y, Ding M, Ghanem B (2018) Sod-mtgan: small object detection via multi-task generative adversarial network. In: Proceedings of the European conference on computer vision (ECCV), pp 206–221

  16. Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1222–1230

  17. Noh J, Bae W, Lee W, Seo J, Kim G (2019) Better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9725–9734

  18. Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3588–3597

  19. Lim J-S, Astrid M, Yoon H-J, Lee S-I (2021) Small object detection using context and attention. In: 2021 international conference on artificial intelligence in information and communication (ICAIIC). IEEE, pp 181–186

  20. Xu S, Gu J, Hua Y, Liu Y (2023) Dktnet: dual-key transformer network for small object detection. Neurocomputing 525:29–41

    Article  Google Scholar 

  21. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Computer vision–ECCV 2016: 14th European conference, proceedings, Part I 14. Springer, pp 21–37

  22. Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S3fd: single shot scale-invariant face detector. In: Proceedings of the IEEE international conference on computer vision, pp 192–201

  23. Xu C, Wang J, Yang W, Yu L (2021) Dot distance for tiny object detection in aerial images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1192–1201

  24. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  25. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768

  26. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9259–9266

  27. Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7036–7045

  28. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790

  29. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  30. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  31. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11534–11542

  32. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13713–13722

  33. Yang L, Zhang R-Y, Li L, Xie X (2021) Simam: a simple, parameter-free attention module for convolutional neural networks. In: International conference on machine learning. PMLR, pp 11863–11874

  34. Zhang Q-L, Yang Y-B (2021) Sa-net: shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2235–2239

  35. Shao Z, Han J, Debattista K, Pang Y (2023) Textual context-aware dense captioning with diverse words. IEEE Trans Multimed

  36. Gupta A, Narayan S, Joseph K, Khan S, Khan FS, Shah M (2022) Ow-detr: open-world detection transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9235–9244

  37. Chu F, Cao J, Shao Z, Pang Y (2022) Illumination-guided transformer-based network for multispectral pedestrian detection. In: CAAI international conference on artificial intelligence. Springer, pp 343–355

  38. Cheng Y, Zhu J, Jiang M, Fu J, Pang C, Wang P, Sankaran K, Onabola O, Liu Y, Liu D et al (2021) Flow: a dataset and benchmark for floating waste detection in inland waters. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10953–10962

  39. Yang X, Zhao J, Zhao L, Zhang H, Li L, Ji Z, Ganchev I (2022) Detection of river floating garbage based on improved yolov5. Math 10(22):4366

    Article  Google Scholar 

  40. Jiang Z, Wu B, Ma L, Lian J (2023) Faster-rcnn water-floating garbage recognition based on multi-scale feature and polarized self-attention. J Comput Appl 0

  41. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  42. Zhang L, Wei Y, Wang H, Shao Y, Shen J (2021) Real-time detection of river surface floating object based on improved refinedet. IEEE Access 9:81147–81160

    Article  Google Scholar 

  43. Ma L, Wu B, Deng J, Lian J (2023) Small-target water-floating garbage detection and recognition based on unet-yolov5s. In: 2023 5th international conference on communications, information system and computer engineering (CISCE). IEEE, pp 391–395

  44. Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: making vgg-style convnets great again. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13733–13742

  45. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv:2107.08430

  46. Gevorgyan Z (2022) Siou loss: more powerful learning for bounding box regression. arXiv:2205.12740

  47. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Computer vision–ECCV 2014: 13th European conference, proceedings, part V 13. Springer, pp 740–755

  48. Antonelli S, Avola D, Cinque L, Crisostomi D, Foresti GL, Galasso F, Marini MR, Mecca A, Pannone D (2022) Few-shot object detection: a survey. ACM Computing Surveys (CSUR) 54(11s):1–37

    Article  Google Scholar 

  49. Wang J, Pang Y, Cao J, Sun H, Shao Z, Li X (2023) Deep intra-image contrastive learning for weakly supervised one-step person search. arXiv:2302.04607

  50. Wu H, Wu G, Hu J, Xu S, Zhang S, Liu Y (2023) Cityuplaces: a new dataset for efficient vision-based recognition. J Real-Time Image Proc 20(6):109

    Article  Google Scholar 

  51. Liu Y, Zhang D, Zhang Q, Han J (2021) Part-object relational visual saliency. IEEE Trans Pattern Anal Mach Intell 44(7):3688–3704

    Google Scholar 

  52. Liu Y, Zhang D, Liu N, Xu S, Han J (2022) Disentangled capsule routing for fast part-object relational saliency. IEEE Trans Image Process 31:6719–6732

    Article  Google Scholar 

  53. Liu Y, Dong X, Zhang D, Xu S (2023) Deep unsupervised part-whole relational visual saliency. Neurocomputing 126916

  54. Liu Y, Zhang D, Zhang Q, Han J (2021) Integrating part-object relationship and contrast for camouflaged object detection. IEEE Trans Inf Forensics Secur 16:5154–5166

    Article  Google Scholar 

  55. Gao A, Pang Y, Nie J, Shao Z, Cao J, Guo Y, Li X (2022) Esgn: efficient stereo geometry network for fast 3d object detection. IEEE Trans Circ Syst Vid Technol

Download references

Acknowledgements

This work was supported by Jiangsu Petrochemical Process Key Equipment Digital Twin Technology Engineering Research Center Open Project (DTEC202103).

Author information

Authors and Affiliations

Authors

Contributions

Ning Li and Mingliang Wang have led the conception and design of the work, as well as the acquisition and interpretation of data. They’ve also been instrumental in drafting and revising the content to ensure intellectual value. The final version has been approved by Shoukun Xu and reviewed by Gaochao Yang, Bo Li and Baohua Yuan.

Corresponding author

Correspondence to Shoukun Xu.

Ethics declarations

Conflict of interest

All of us here attest that there are no competing interests with this study.

Ethical and informed consent for data used

Ethical and informed consent for data used.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, N., Wang, M., Yang, G. et al. DENS-YOLOv6: a small object detection model for garbage detection on water surface. Multimed Tools Appl 83, 55751–55771 (2024). https://doi.org/10.1007/s11042-023-17679-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17679-7

Keywords

Navigation