Skip to main content
Log in

An underwater target recognition algorithm incorporating improved attention mechanism and downsampling

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

To address the issue of low accuracy in recognizing underwater targets due to dense and blurred targets in underwater target detection, we propose a joint improved attention mechanism and downsampling network for underwater target detection. Firstly, to address the issue of dense targets, we introduce an improved channel attention module. This module enhances attention to spatial dimension information, highlights the saliency of feature maps of different channels and improves the detection ability of dense targets. Secondly, to address the issue of blurred underwater targets, we introduce a down-sampling module that combines same-layer connections and cross-layer skipping. This module reduces information loss caused by convolutional down-sampling and integrates features from different layers more fully. By improving the feature expression of the underwater image, the network’s detection accuracy for underwater blurred targets is further enhanced. Finally, the study introduces the focus loss function to address the imbalance of positive and negative samples. This function dynamically reduces the weight of easy-to-distinguish samples during training and prioritizes difficult-to-distinguish samples. Experimental results demonstrate a 2.71% increase in average accuracy of the improved algorithm on the DUO dataset. Additionally, the calculation amount is reduced by 9.1 GFLOPs, and the parameter amount is reduced by 5.44 M. Code:https://figshare.com/articles/dataset/improved-yolov5/25375129. Dataset:https://figshare.com/articles/dataset/DUO_zip/25370527.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The dataset can be accessed at the following URL: https://figshare.com/articles/dataset/DUO_zip/25370527 [32]. The code can be accessed at the following URL: https://figshare.com/articles/dataset/improved-yolov5/25375129 [33].

References

  1. Hou, W., Jing, H.: Rc-yolov5s: for tile surface defect detection. Vis. Comput. 40, 459–470 (2024)

    Article  Google Scholar 

  2. Sun, X., Shi, J., Liu, L., et al.: Transferring deep knowledge for object recognition in low-quality underwater videos. Neurocomputing 275, 897–908 (2018)

    Article  Google Scholar 

  3. Li, J., Chen, J., Sheng, B., et al.: Automatic detection and classification system of domestic waste via multimodel cascaded convolutional neural network. IEEE Trans. Ind. Inform. 18(1), 163–173 (2022)

    Article  Google Scholar 

  4. Wang, N., Chen, T., Liu, S., et al.: Deep learning-based visual detection of marine organisms: a survey. Neurocomputing 532, 1–32 (2023)

    Article  Google Scholar 

  5. Qiao, X., Bao, J., Zeng, L., et al.: An automatic active contour method for sea cucumber segmentation in natural underwater environments. Comput. Electron. Agric. 135, 134–142 (2017)

    Article  Google Scholar 

  6. Liu, H., Xu, Q., Liu, S., et al.: Evaluation of body weight of sea cucumber apostichopus japonicus by computer vision. Chin. J. Oceanol. Limnol. 33(1), 114–120 (2015)

    Article  Google Scholar 

  7. Khan, A., Fouda, M.M., Do, D.-T., et al.: Underwater target detection using deep learning: methodologies, challenges, applications, and future evolution. IEEE Access 12, 12618–12635 (2024)

    Article  Google Scholar 

  8. Liu, D., Cui, Y., Tan, W., et al.: Sg-net: Spatial granularity network for one-stage video instance segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 9811–9820 (2021)

  9. Cui, Y., Yan, L., Cao, Z., et al.: Tf-blender: Temporal feature blender for video object detection. In: Proc. IEEE Int. Conf. Comput. Vis., pp. 8118–8127 (2021)

  10. Cheng, B., Wei, Y., Shi, H., et al.: Revisiting rcnn: On awakening the classification power of faster rcnn. In: Lect. Notes Comput. Sci., pp. 473–490 (2018)

  11. Ren, S., He, K., Girshick, R., et al.: Faster r-cnn: towards realtime object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  12. He, K., Gkioxari, G., Dollar, P., et al.: Mask r-cnn. In: Proc IEEE Int Conf Comput Vision, pp. 2980–2988 (2017)

  13. Zeng, L., Sun, B., Zhu, D.: Underwater target detection based on faster r-cnn and adversarial occlusion network. Eng. Appl. Artif. Intell. 100, 104190 (2021)

    Article  Google Scholar 

  14. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: Unified, real-time object detection. In: Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, pp. 779–788 (2016)

  15. He, K., Gkioxari, G., Dollar, P., et al.: Yolov3: An incremental improvement. In: Proc IEEE Int Conf Comput Vision, pp. 2980–2988 (2018)

  16. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, pp. 7464–7475 (2023)

  17. Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. In: Proc. IEEE Conf.Comput. Vis. Pattern Recognit., pp. 6517–6525 (2017)

  18. Lin, X., Sun, S., Huang, W., et al.: Eapt: Efficient attention pyramid transformer for image processing. IEEE Trans. Multim. 25, 50–61 (2023)

    Article  Google Scholar 

  19. Li, X., Yu, H., Chen, H.: Multi-scale aggregation feature pyramid with cornerness for underwater object detection. Vis. Comput. 40, 1299–1310 (2024)

    Article  Google Scholar 

  20. Xie, Z., Zhang, W., Sheng, B., et al.: Bagfn: Broad attentive graph fusion network for high-order feature interactions. IEEE Trans. Neural Netw. Learn. Syst. 34(8), 4499–4513 (2023)

    Article  Google Scholar 

  21. Yang, Y., Chen, L., Zhang, J., et al.: Ugc-yolo: underwater environment object detection based on yolo with a global context block. J. Ocean Univ. China 22(3), 665–674 (2023)

    Article  Google Scholar 

  22. Liu, D., Cui, Y., Yan, L., et al.: Densernet: Weakly supervised visual localization using multi-scale feature aggregation. In: Proc. AAAI Conf. Artif. Intell., vol. 35, pp. 6101–6109 (2021)

  23. Wang, W., Han, C., Zhou, T., et al.: Visual recognition with deep nearest centroids. In: Proc. Int. Conf. Learn. Represent. (2023)

  24. Sun, Y., Zheng, W., Du, X., et al.: Underwater small target detection based on yolox combined with mobilevit and double coordinate attention. J. Mar. Sci. Eng. 11(6), 1178 (2023)

    Article  Google Scholar 

  25. Chen, Z., Qiu, G., Li, P., et al.: Mngnas: distilling adaptive combination of multiple searched networks for one-shot neural architecture search. IEEE Trans. Pattern Anal. Mach. Intell. 45(11), 13489–13508 (2023)

    Google Scholar 

  26. Hu, J., Shen, L., Albanie, S., et al.: Squeeze-and-excitation networks. In: Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, pp. 7132–7141 (2018)

  27. Liu, C., Li, H., Wang, S., et al.: A dataset and benchmark of underwater object detection for robot picking. In: IEEE Int. Conf. Multimed. Expo Workshops, ICMEW, pp. 1–6 (2021)

  28. Everingham, M., Eslami, S.M.A., Van Gool, L., et al.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)

    Article  Google Scholar 

  29. Fu, C., Liu, R., Fan, X., et al.: Rethinking general underwater object detection: datasets, challenges, and solutions. Neurocomputing 517, 243–256 (2023)

    Article  Google Scholar 

  30. Jungseok, H., Michael, F., Junaed, S.: TrashCan: A Semantically-Segmented Dataset towards Visual Detection of Marine Debris. Preprint at arXiv:2007.08097 (2020)

  31. Liu, C., Wang, Z., Wang, S., et al.: A new dataset, poisson gan and aquanet for underwater object grabbing. IEEE Trans. Circuits Syst. Video Technol. 32(5), 2831–2844 (2022)

    Article  Google Scholar 

  32. Zhu, Q., Cen, Q., Wang, Y., et al.: Duo.zip. figshare.dataset (2024). https://doi.org/10.6084/m9.figshare.25370527.v1

  33. Zhu, Q., Cen, Q., Wang, Y., et al.: improved-yolov5. figshare. dataset (2024). https://doi.org/10.6084/m9.figshare.25375129.v1

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grants 61773333 and 62273296.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiang Cen.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Q., Cen, Q., Wang, Y. et al. An underwater target recognition algorithm incorporating improved attention mechanism and downsampling. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03437-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-024-03437-9

Keywords

Navigation