Lightweight Transformers make strong encoders for underwater object detection

Cui, Jinrong; Liu, Hailong; Zhong, Haowei; Huang, Cheng; Zhang, Weifeng

doi:10.1007/s11760-022-02400-2

Lightweight Transformers make strong encoders for underwater object detection

Original Paper
Published: 28 December 2022

Volume 17, pages 1889–1896, (2023)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Jinrong Cui¹,
Hailong Liu¹,
Haowei Zhong¹,
Cheng Huang¹ &
…
Weifeng Zhang¹

504 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Underwater object detection methods are widely used in ocean exploration tasks, and precise center localization can help users find objects of interest accurately and quickly. In recent years, the underwater detector based on convolutional neural networks (CNNs) has achieved great success. However, due to the locality of convolution, the detector based on CNNs is usually difficult to explicitly model the long-term dependence. In addition, Transformers can obtain global context, but it will seriously reduce the inference speed of the detector, because Transformers need a lot of memory and computation. In this paper, we propose CSPTCenterNet underwater detector, which uses a proposed lightweight Transformers to extract global context, so as to improve the performance of the detector while maintaining real-time detection. And we fuse the encoded feature maps with the high-resolution feature maps in the backbone network in the upsampling stage to increase the spatial details that Transformers lack. Finally, we use GIoU loss and multi-samples strategy to train the network to enhance the accurate regression ability of the detector. Extensive experiments on the underwater dataset and the PASCAL VOC dataset demonstrate the effectiveness of our proposed method. And our method achieves the best detection performance while achieving inference speed 2 to 10 times faster than other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Concatenate and Shuffle Network: A Real-Time Underwater Object Detector for Small and Dense Objects

UGC-YOLO: Underwater Environment Object Detection Based on YOLO with a Global Context Block

Article 13 May 2023

SSoB: searching a scene-oriented architecture for underwater object detection

Article 10 September 2022

Availability of data and materials

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Bello, I., Zoph, B., Vaswani, A., et al.: Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3286–3295 (2019)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection (2020). arXiv preprint arXiv:2004.10934
Carion, N., Massa, F., Synnaeve, G., et al.: End-to-end object detection with transformers. In: European conference on computer vision, Springer, pp 213–229 (2020)
Chen, Q., Wang, Y., Yang, T., et al.: You only look one-level feature. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13,039–13,048 (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020)
Everingham, M., Van Gool, L., Williams, C.K.I., et al.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4
Article Google Scholar
Fan, Z., Xia, W., Liu, X., et al.: Detection and segmentation of underwater objects from forward-looking sonar based on a modified Mask RCNN. SIViP 15(6), 1135–1143 (2021). https://doi.org/10.1007/s11760-020-01841-x
Article Google Scholar
Fu, C.Y., Liu, W., Ranga, A., et al.: DSSD : Deconvolutional Single Shot Detector (2017). arXiv:1701.06659 [cs] ArXiv: 1701.06659
Girshick, R., Donahue, J., Darrell, T., et al.: Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. pp 580–587 (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 (2016)
He, K., Gkioxari, G., Dollár, P., et al.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969 (2017)
Huang, H., Zhou, H., Yang, X., et al.: Faster R-CNN for marine organisms detection and recognition using data augmentation. Neurocomputing 337, 372–384 (2019). https://doi.org/10.1016/j.neucom.2019.01.084
Article Google Scholar
Law, H., Deng, J.: Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750 (2018)
Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988 (2017)
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: Single Shot MultiBox Detector. In: Leibe B, Matas J, Sebe N, et al (eds) Computer Vision - ECCV 2016. Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 21–37 (2016), https://doi.org/10.1007/978-3-319-46448-0_2
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10,012–10,022 (2021)
Pan, T.S., Huang, H.C., Lee, J.C., et al.: Multi-scale ResNet for real-time underwater object detection. SIViP 15(5), 941–949 (2021). https://doi.org/10.1007/s11760-020-01818-w
Article Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement (2018). arXiv:1804.02767 [cs]
Redmon, J., Divvala, S., Girshick, R., et al.: You Only Look Once: Unified, Real-Time Object Detection. pp 779–788 (2016)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031, conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)
Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. pp 658–666 (2019)
Srinivas, A., Lin, T.Y., Parmar, N., et al.: Bottleneck Transformers for Visual Recognition. pp 16,519–16,529 (2021)
Tian, Z., Shen, C., Chen, H., et al.: FCOS: Fully Convolutional One-Stage Object Detection. pp 9627–9636 (2019)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Zhang, S., Chi, C., Yao, Y., et al.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9759–9768 (2020)
Zhang, X., Wan, F., Liu, C., et al.: Freeanchor: Learning to match anchors for visual object detection. Advances in neural information processing systems 32 (2019)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points (2019). arXiv preprint arXiv:1904.07850
Zhu, X., Su, W., Lu, L., et al.: Deformable detr: Deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2020)

Download references

Acknowledgements

This project was supported by Guangzhou Key Laboratory of Intelligent Agriculture (201902010081).

Author information

Authors and Affiliations

College of Mathematics and Informatics, South China Agricultural University, Guangzhou, 510642, Guangdong, China
Jinrong Cui, Hailong Liu, Haowei Zhong, Cheng Huang & Weifeng Zhang

Authors

Jinrong Cui
View author publications
You can also search for this author in PubMed Google Scholar
Hailong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haowei Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Weifeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Hailong Liu, Jinrong Cui. The first draft of the manuscript was written by Hailong Liu, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Weifeng Zhang.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cui, J., Liu, H., Zhong, H. et al. Lightweight Transformers make strong encoders for underwater object detection. SIViP 17, 1889–1896 (2023). https://doi.org/10.1007/s11760-022-02400-2

Download citation

Received: 27 June 2022
Revised: 04 November 2022
Accepted: 13 November 2022
Published: 28 December 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11760-022-02400-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lightweight Transformers make strong encoders for underwater object detection

Abstract

Access this article

Similar content being viewed by others

Concatenate and Shuffle Network: A Real-Time Underwater Object Detector for Small and Dense Objects

UGC-YOLO: Underwater Environment Object Detection Based on YOLO with a Global Context Block

SSoB: searching a scene-oriented architecture for underwater object detection

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lightweight Transformers make strong encoders for underwater object detection

Abstract

Access this article

Similar content being viewed by others

Concatenate and Shuffle Network: A Real-Time Underwater Object Detector for Small and Dense Objects

UGC-YOLO: Underwater Environment Object Detection Based on YOLO with a Global Context Block

SSoB: searching a scene-oriented architecture for underwater object detection

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation