Abstract
The study of garbage detection on water surface is of great significance for the development of water surface garbage monitoring and automated water surface garbage salvage. However, in water surface garbage scenes, the proportion of water background is relatively large, while the proportion of detection objects is relatively small. Moreover, the objects are easily affected by noise interference such as lighting, water waves, and reflections, which makes it difficult to extract object features and affects detection accuracy. In this paper, we propose a Detail Enhancement Noise Suppression YOLOv6 (DENS-YOLOv6) detection algorithm based on YOLOv6. Firstly, to better capture the detailed feature information of small objects, we design a Detail Information Enhancement Module (DIEM) based on atrous convolution. Secondly, to suppress noise interference on small objects, we develop an Adaptive Noise Suppression Module (ANSM). Finally, in order to improve the stability and convergence speed of the model training, we employ a regression loss function based on the Normalized Wasserstein Distance(NWD) metric. Experiments were conducted on the Flow+ dataset with a large number of small objects and the publicly available Pascal VOC2007 dataset. The mAP\(_S\) indicators reached 40.6% and 11.4%, respectively. Compared with other models, DENS-YOLOv6 achieved the highest small object detection accuracy
Similar content being viewed by others
Data availability and access
Data is available on request from the authors.
References
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: deformable transformers for end-to-end object detection. arXiv:2010.04159
Zhang H, Li F, Liu S, Zhang L, Su H, Zhu J, Ni LM, Shum H-Y (2022) Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv:2203.03605
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W et al (2022) Yolov6: a single-stage object detection framework for industrial applications. arXiv:2209.02976
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7464–7475
Wang J, Xu C, Yang W, Yu L (2021) A normalized gaussian wasserstein distance for tiny object detection. arXiv:2110.13389
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88:303–338
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst 27
Bai Y, Zhang Y, Ding M, Ghanem B (2018) Sod-mtgan: small object detection via multi-task generative adversarial network. In: Proceedings of the European conference on computer vision (ECCV), pp 206–221
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1222–1230
Noh J, Bae W, Lee W, Seo J, Kim G (2019) Better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9725–9734
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3588–3597
Lim J-S, Astrid M, Yoon H-J, Lee S-I (2021) Small object detection using context and attention. In: 2021 international conference on artificial intelligence in information and communication (ICAIIC). IEEE, pp 181–186
Xu S, Gu J, Hua Y, Liu Y (2023) Dktnet: dual-key transformer network for small object detection. Neurocomputing 525:29–41
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Computer vision–ECCV 2016: 14th European conference, proceedings, Part I 14. Springer, pp 21–37
Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ (2017) S3fd: single shot scale-invariant face detector. In: Proceedings of the IEEE international conference on computer vision, pp 192–201
Xu C, Wang J, Yang W, Yu L (2021) Dot distance for tiny object detection in aerial images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1192–1201
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9259–9266
Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7036–7045
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11534–11542
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13713–13722
Yang L, Zhang R-Y, Li L, Xie X (2021) Simam: a simple, parameter-free attention module for convolutional neural networks. In: International conference on machine learning. PMLR, pp 11863–11874
Zhang Q-L, Yang Y-B (2021) Sa-net: shuffle attention for deep convolutional neural networks. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2235–2239
Shao Z, Han J, Debattista K, Pang Y (2023) Textual context-aware dense captioning with diverse words. IEEE Trans Multimed
Gupta A, Narayan S, Joseph K, Khan S, Khan FS, Shah M (2022) Ow-detr: open-world detection transformer. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9235–9244
Chu F, Cao J, Shao Z, Pang Y (2022) Illumination-guided transformer-based network for multispectral pedestrian detection. In: CAAI international conference on artificial intelligence. Springer, pp 343–355
Cheng Y, Zhu J, Jiang M, Fu J, Pang C, Wang P, Sankaran K, Onabola O, Liu Y, Liu D et al (2021) Flow: a dataset and benchmark for floating waste detection in inland waters. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10953–10962
Yang X, Zhao J, Zhao L, Zhang H, Li L, Ji Z, Ganchev I (2022) Detection of river floating garbage based on improved yolov5. Math 10(22):4366
Jiang Z, Wu B, Ma L, Lian J (2023) Faster-rcnn water-floating garbage recognition based on multi-scale feature and polarized self-attention. J Comput Appl 0
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Zhang L, Wei Y, Wang H, Shao Y, Shen J (2021) Real-time detection of river surface floating object based on improved refinedet. IEEE Access 9:81147–81160
Ma L, Wu B, Deng J, Lian J (2023) Small-target water-floating garbage detection and recognition based on unet-yolov5s. In: 2023 5th international conference on communications, information system and computer engineering (CISCE). IEEE, pp 391–395
Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: making vgg-style convnets great again. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13733–13742
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv:2107.08430
Gevorgyan Z (2022) Siou loss: more powerful learning for bounding box regression. arXiv:2205.12740
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Computer vision–ECCV 2014: 13th European conference, proceedings, part V 13. Springer, pp 740–755
Antonelli S, Avola D, Cinque L, Crisostomi D, Foresti GL, Galasso F, Marini MR, Mecca A, Pannone D (2022) Few-shot object detection: a survey. ACM Computing Surveys (CSUR) 54(11s):1–37
Wang J, Pang Y, Cao J, Sun H, Shao Z, Li X (2023) Deep intra-image contrastive learning for weakly supervised one-step person search. arXiv:2302.04607
Wu H, Wu G, Hu J, Xu S, Zhang S, Liu Y (2023) Cityuplaces: a new dataset for efficient vision-based recognition. J Real-Time Image Proc 20(6):109
Liu Y, Zhang D, Zhang Q, Han J (2021) Part-object relational visual saliency. IEEE Trans Pattern Anal Mach Intell 44(7):3688–3704
Liu Y, Zhang D, Liu N, Xu S, Han J (2022) Disentangled capsule routing for fast part-object relational saliency. IEEE Trans Image Process 31:6719–6732
Liu Y, Dong X, Zhang D, Xu S (2023) Deep unsupervised part-whole relational visual saliency. Neurocomputing 126916
Liu Y, Zhang D, Zhang Q, Han J (2021) Integrating part-object relationship and contrast for camouflaged object detection. IEEE Trans Inf Forensics Secur 16:5154–5166
Gao A, Pang Y, Nie J, Shao Z, Cao J, Guo Y, Li X (2022) Esgn: efficient stereo geometry network for fast 3d object detection. IEEE Trans Circ Syst Vid Technol
Acknowledgements
This work was supported by Jiangsu Petrochemical Process Key Equipment Digital Twin Technology Engineering Research Center Open Project (DTEC202103).
Author information
Authors and Affiliations
Contributions
Ning Li and Mingliang Wang have led the conception and design of the work, as well as the acquisition and interpretation of data. They’ve also been instrumental in drafting and revising the content to ensure intellectual value. The final version has been approved by Shoukun Xu and reviewed by Gaochao Yang, Bo Li and Baohua Yuan.
Corresponding author
Ethics declarations
Conflict of interest
All of us here attest that there are no competing interests with this study.
Ethical and informed consent for data used
Ethical and informed consent for data used.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, N., Wang, M., Yang, G. et al. DENS-YOLOv6: a small object detection model for garbage detection on water surface. Multimed Tools Appl 83, 55751–55771 (2024). https://doi.org/10.1007/s11042-023-17679-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17679-7