Multi-dimensional, multi-functional and multi-level attention in YOLO for underwater object detection

Shen, Xin; Sun, Xudong; Wang, Huibing; Fu, Xianping

doi:10.1007/s00521-023-08781-w

Multi-dimensional, multi-functional and multi-level attention in YOLO for underwater object detection

Original Article
Published: 14 July 2023

Volume 35, pages 19935–19960, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xin Shen¹,
Xudong Sun¹,
Huibing Wang¹ &
…
Xianping Fu^1,2

664 Accesses
3 Citations
Explore all metrics

Abstract

Underwater object detection is a prerequisite for underwater robots to achieve autonomous operation and ocean exploration. However, poor imaging quality, harsh underwater environments and concealed underwater targets greatly aggravate the difficulty of underwater object detection. In order to reduce underwater background interference and improve underwater object perception, we propose a multi-dimensional, multi-functional and multi-level attention module (mDFLAM). The multi-dimensional strategy first enhances the robustness of attention application by collecting valuable information in different target dimensions. The multi-functional strategy further improves the flexibility of attention calibration by capturing the importance of channel semantic information and the dependence of spatial location information. The multi-level strategy finally enriches the diversity of attention perception by extracting the intrinsic information under different receptive fields. In pre-processing and post-processing stages, cross-splitting and cross-linking stimulate the synergistic calibration advantage of multi-dimensional and multi-functional attention by redistributing channel dimensions and restoring feature states. In the attention calibration stage, adaptive fusion stimulates the synergistic calibration advantage of multi-level attention by assigning learnable parameters. In order to meet the high-precision and real-time requirements for underwater object detection, we integrate the plug-and-play mDFLAM into YOLO detectors. The full-port embedding further strengthens the semantic information expression by improving the feature fusion quality between scales. In underwater detection tasks, ablation and comparison experiments demonstrate the rationality and effectiveness of our attention design. In other detection tasks, our work shows good robustness and generalization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple information perception-based attention in YOLO for underwater object detection

Article 25 May 2023

Criss-cross global interaction-based selective attention in YOLO for underwater object detection

Article 29 July 2023

Underwater object detection based on enhanced YOLOv4 architecture

Article 24 November 2023

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1492–1500
Lin WH, Zhong JX, Liu S, Li T, Li G (2020) Roimix: proposal-fusion among multiple images for underwater object detection. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2588–2592. IEEE
Xu FQ, Wang HB, Peng JJ, Fu XP (2021) Scale-aware feature pyramid architecture for marine object detection. Neural Comput Appl 33:3637–3653
Article Google Scholar
Xu FQ, Wang HB, Sun XD, Fu XP (2022) Refined marine object detector with attention-based spatial pyramid pooling networks and bidirectional feature fusion strategy. Neural Comput Appl 34(17):14881–14894
Article Google Scholar
Wang WK, Huang WJ, Lu QL, Chen JY, Zhang MH, Qiao J, Zhang Y (2022) Attention mechanism-based deep learning method for hairline fracture detection in hand X-rays. Neural Comput Appl 34(21):18773–18785
Li XJ, Ding JQ, Tang JJ, Guo F (2022) Res2unet: a multi-scale channel attention network for retinal vessel segmentation. Neural Comput Appl 34(14):12001–12015
Zhang XL, Du BC, Wu ZY, Wan TB (2022) Laanet: lightweight attention-guided asymmetric network for real-time semantic segmentation. Neural Comput Appl 34(5):3573–3587
Yang J, Zhang CL, Tang YP, Li ZX (2022) Pafm: pose-drive attention fusion mechanism for occluded person re-identification. Neural Comput Appl 34(10):8241–8252
Article Google Scholar
Zhou LY, Fan XJ, Tjahjadi T, Das Choudhury S (2022) Discriminative attention-augmented feature learning for facial expression recognition in the wild. Neural Comput Appl 34(2):925–936
Article Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7132–7141
Lee H, Kim HE, Nam H (2019) Srm: a style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 1854–1862
Wang Q, Wu B, Zhu P, Li P, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Yang ZX, Zhu LC, Wu Y, Yang Y (2020) Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11794–11803
Qin Z, Zhang P, Wu F, Li X (2021) Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 783–792
Chen YP, Kalantidis Y, Li JS, Yan SC, Feng JS (2018) A\(^{2}\)-nets: double attention networks. In: Proceedings of the 32nd international conference on neural information processing systems, pp 350–359
Li X, Hu XL, Yang J (2019) Spatial group-wise enhance: improving semantic feature learning in convolutional networks. arXiv preprint arXiv:1905.09646
Park J, Woo S, Lee JY, Kweon IS (2018) Bam: bottleneck attention module. arXiv preprint arXiv:1807.06514
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13713–13722
Zhang QL, Yang YB (2021) Sa-net: shuffle attention for deep convolutional neural networks. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 2235–2239. IEEE
Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3139–3148
Zhang ZZ, Lan CL, Zeng WJ, Jin X, Chen ZB (2020) Relation-aware global attention for person re-identification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3186–3195
Zhang H, Wu CR, Zhang ZY, Zhu Y, Lin HB, Zhang Z, Sun Y, He T, Mueller J, Manmatha R (2022) Resnest: split-attention networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2736–2746
Pan JT, Bulat A, Tan FW, Zhu XT, Dudziak L, Li HS, Tzimiropoulos G, Martinez B (2022) Edgevits: competing light-weight cnns on mobile devices with vision transformers. arXiv preprint arXiv:2205.03436
Gao ZL, Xie JT, Wang QL, Li PH (2019) Global second-order pooling convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3024–3033
Liu JJ, Hou QB, Cheng MM, Wang CH, Feng JS (2020) Improving convolutional networks with self-calibrated convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10096–10105
Li GQ, Fang Q, Zha LL, Gao X, Zheng NG (2022) Ham: hybrid attention module in deep convolutional neural networks for image classification. Pattern Recognit 129:108785
Article Google Scholar
Chen BH, Deng WH, Hu JN (2019) Mixed high-order attention network for person re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 371–381
Li X, Wang WH, Hu XL, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519
Zhang H, Zu K, Lu J, Zou Y, Meng D (2021) Epsanet: an efficient pyramid split attention block on convolutional neural network. arXiv preprint arXiv:2105.14447
Guo MH, Lu CZ, Hou QB, Liu ZN, Cheng MM, Hu SM (2022) Segnext: rethinking convolutional attention design for semantic segmentation. arXiv preprint arXiv:2209.08575
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Jocher G et al (2021) Yolov5. https://github.com/ultralytics/yolov5
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430
Yolov6: a single-stage object detection framework dedicated to industrial applications. https://github.com/meituan/YOLOv6 (2022)
Wang CY, Bochkovskiy A, Liao HY (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696
Underwater robot picking contest. http://www.cnurpc.org/
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The Pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Everingham M, Eslami SMA, Gool LV, Williams CKI, Winn J, Zisserman A (2015) The Pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Article Google Scholar
Microsoft coco dataset. https://cocodataset.org/
Fisher NI, Switzer P (2001) Graphical assessment of dependence: Is a picture worth 100 tests? The American Statistician 55(3):233–239
Article MathSciNet Google Scholar
Selvaraju RR, Cogswell M, Das R, Vedantam A, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 618–626

Download references

Acknowledgements

The authors gratefully acknowledge the financial supports from the National Natural Science Foundation of China under Grant 61370142, Grant 61802043, Grant 61272368, Grant 62176037 and Grant 62002041, in part by the Fundamental Research Funds for the Central Universities under Grant 3132016352 and Grant 3132021238, in part by the Dalian Science and Technology Innovation Fund under Grant 2018J12GX037, Grant 2019J11CY001 and Grant 2021JJ12GX028, in part by Liaoning Revitalization Talents Program under Grant XLYC1908007, in part by the Liaoning Doctoral Research Start-up Fund Project Grant 2021-BS-075, and in part by the China Postdoctoral Science Foundation under Grant 3620080307.

Author information

Authors and Affiliations

School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
Xin Shen, Xudong Sun, Huibing Wang & Xianping Fu
Peng Cheng Laboratory, Shenzhen, 518000, China
Xianping Fu

Authors

Xin Shen
View author publications
You can also search for this author in PubMed Google Scholar
Xudong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Huibing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xianping Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianping Fu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shen, X., Sun, X., Wang, H. et al. Multi-dimensional, multi-functional and multi-level attention in YOLO for underwater object detection. Neural Comput & Applic 35, 19935–19960 (2023). https://doi.org/10.1007/s00521-023-08781-w

Download citation

Received: 14 November 2022
Accepted: 12 June 2023
Published: 14 July 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00521-023-08781-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-dimensional, multi-functional and multi-level attention in YOLO for underwater object detection

Abstract

Access this article

Similar content being viewed by others

Multiple information perception-based attention in YOLO for underwater object detection

Criss-cross global interaction-based selective attention in YOLO for underwater object detection

Underwater object detection based on enhanced YOLOv4 architecture

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-dimensional, multi-functional and multi-level attention in YOLO for underwater object detection

Abstract

Access this article

Similar content being viewed by others

Multiple information perception-based attention in YOLO for underwater object detection

Criss-cross global interaction-based selective attention in YOLO for underwater object detection

Underwater object detection based on enhanced YOLOv4 architecture

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation