Dual Refinement Underwater Object Detection Network

Fan, Baojie; Chen, Wei; Cong, Yang; Tian, Jiandong

doi:10.1007/978-3-030-58565-5_17

Baojie Fan¹²,
Wei Chen¹²,
Yang Cong¹³ &
…
Jiandong Tian¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12365))

Included in the following conference series:

European Conference on Computer Vision

4035 Accesses
37 Citations

Abstract

Due to the complex underwater environment, underwater imaging often encounters some problems such as blur, scale variation, color shift, and texture distortion. Generic detection algorithms can not work well when we use them directly in the underwater scene. To address these problems, we propose an underwater detection framework with feature enhancement and anchor refinement. It has a composite connection backbone to boost the feature representation and introduces a receptive field augmentation module to exploit multi-scale contextual features. The developed underwater object detection framework also provides a prediction refinement scheme according to six prediction layers, it can refine multi-scale features to better align with anchors by learning from offsets, which solve the problem of sample imbalance to a certain extent. We also construct a new underwater detection dataset, denoted as UWD, which has more than 10,000 train-val and test underwater images. The extensive experiments on PASCAL VOC and UWD demonstrate the favorable performance of the proposed underwater detection framework against the states-of-the-arts methods in terms of accuracy and robustness. Source code and models are available at: https://github.com/Peterchen111/FERNet.

B. Fan and W. Chen—The first two authors contribute equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Datasets Annotation Tool. https://github.com/tzutalin/labelImg.
2.
Underwater Robot Picking Contest. http://www.cnurpc.org/.

References

Cao, J., Pang, Y., Li, X.: Triply supervised decoder networks for joint detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7392–7401 (2019)
Google Scholar
Chen, X., Lu, Y., Wu, Z., Yu, J., Wen, L.: Reveal of domain effect: how visual restoration contributes to object detection in aquatic scenes. arXiv. Computer Vision and Pattern Recognition (2020)
Google Scholar
Chen, Y., Han, C., Wang, N., Zhang, Z.: Revisiting feature alignment for one-stage object detection. arXiv preprint arXiv:1908.01570 (2019)
Chen, Z., Zhang, Z., Dai, F., Bu, Y., Wang, H.: Monocular vision-based underwater object detection. Sensors 17(8), 1784 (2017)
Article Google Scholar
Cong, Y., Fan, B., Hou, D., Fan, H., Liu, K., Luo, J.: Novel event analysis for human-machine collaborative underwater exploration. Pattern Recogn. 96, 106967 (2019)
Article Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4
Article Google Scholar
Galceran, E., Djapic, V., Carreras, M., Williams, D.P.: A real-time underwater object detection algorithm for multi-beam forward looking sonar. IFAC Proc. Vol. 45(5), 306–311 (2012)
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Henriksen, L.: Real-time underwater object detection based on an electrically scanned high-resolution sonar. In: Proceedings of IEEE Symposium on Autonomous Underwater Vehicle Technology (AUV 1994), pp. 99–104. IEEE (1995)
Google Scholar
Li, C., Anwar, S., Porikli, F.: Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recogn. 98, 107038 (2020)
Article Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, W.H., Zhong, J.X., Liu, S., Li, T., Li, G.: RoIMix: proposal-fusion among multiple images for underwater object detection. arXiv preprint arXiv:1911.03029 (2019)
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. arXiv preprint arXiv:1711.07767 (2017)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Y., et al.: CBNet: a novel composite backbone network architecture for object detection. arXiv preprint arXiv:1909.03625 (2019)
Lv, X., Wang, A., Liu, Q., Sun, J., Zhang, S.: Proposal-refined weakly supervised object detection in underwater images. In: Zhao, Y., Barnes, N., Chen, B., Westermann, R., Kong, X., Lin, C. (eds.) ICIG 2019. LNCS, vol. 11901, pp. 418–428. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34120-6_34
Chapter Google Scholar
Mullen, L.J., et al.: Modulated laser line scanner for enhanced underwater imaging. In: Airborne and In-Water Underwater Imaging, vol. 3761, pp. 2–9. International Society for Optics and Photonics (1999)
Google Scholar
Pang, Y., Wang, T., Anwer, R.M., Khan, F.S., Shao, L.: Efficient featurized image pyramid network for single shot detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7336–7344 (2019)
Google Scholar
Purkait, P., Zhao, C., Zach, C.: SPP-Net: deep absolute pose regression with synthetic views. arXiv preprint arXiv:1712.03452 (2017)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9627–9636 (2019)
Google Scholar
Touretzky, D.S., Mozer, M.C., Hasselmo, M.E.: Advances in Neural Information Processing Systems 8: Proceedings of the 1995 Conference, vol. 8. MIT Press, Cambridge (1996)
Google Scholar
Wong, A., Famuori, M., Shafiee, M.J., Li, F., Chwyl, B., Chung, J.: YOLO Nano: a highly compact you only look once convolutional neural network for object detection. arXiv preprint arXiv:1910.01271 (2019)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: RepPoints: point set representation for object detection, pp. 9657–9666 (2019)
Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: Mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018)
Google Scholar
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. arXiv preprint arXiv:1708.04896 (2017)
Zhu, R., et al.: ScratchDet: training single-shot object detectors from scratch. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2268–2277 (2019)
Google Scholar

Download references

Acknowledgments

This work is supported by the Ministry of Science and Technology of the People’s Republic of China (2019YFB1310300), National Natural Science Foundation of China (No. 61876092), State Key Laboratory of Robotics (No. 2019-O07) and State Key Laboratory of Integrated Service Network (ISN20-08).

Author information

Authors and Affiliations

College of Automation and College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
Baojie Fan & Wei Chen
Shenyang Institute of Automation (SIA), Chinese Academy of Sciences, Shenyang, 110016, China
Yang Cong & Jiandong Tian

Authors

Baojie Fan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yang Cong
View author publications
You can also search for this author in PubMed Google Scholar
Jiandong Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baojie Fan .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fan, B., Chen, W., Cong, Y., Tian, J. (2020). Dual Refinement Underwater Object Detection Network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12365. Springer, Cham. https://doi.org/10.1007/978-3-030-58565-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-58565-5_17
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58564-8
Online ISBN: 978-3-030-58565-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics