Abstract
Object detection is a fundamental problem in computer vision. Although impressive results have been achieved on large/medium-sized objects, the detection performance of small objects remains a challenging task. Automatic ship detection on remote sensing images is an important module in maritime surveillance system, and it is challenging due to the high variance in appearance and scale. In this work, we thoroughly discuss the issues of SSD on multi-scale objects and propose a multi-scale single-shot detector (MS-SSD) to improve the detection effect of small ship targets and enhance the model’s robustness to scale variance. It enjoys two benefits by introducing (1) more high-level context and (2) more appropriate supervision. Extensive experiments on the Airbus Ship Detection Challenge dataset demonstrate the effectiveness of the proposed method in ship detection from complex backgrounds in remote sensing images. We also achieve better detection performance on the COCO dataset, outperforming state-of-the-art approaches, especially for small targets.
This is a preview of subscription content,
to check access.











References
Wang X, Kong T, Shen C, Jiang Y, Li L (2020) Solo: Segmenting objects by locations. In: European conference on computer vision. Springer, pp 649–665
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: Coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 4126–4134
Zhang S, Wen L, Bian X, Lei Z, Li S Z (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Han J, Liang K, Zhou B, Zhu X, Zhao J, Zhao L (2018) Infrared small target detection utilizing the multiscale relative local contrast measure. IEEE Geosci Remote Sens Lett 15(4):612–616
Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. CoRR, arXiv:1902.07296
Chen C, Liu M-Y, Tuzel O, Xiao J (2016) R-cnn for small object detection. In: Asian conference on computer vision. Springer, pp 214–230
Hu G X, Yang Z, Hu L, Huang L, Han J M (2018) Small object detection with multiscale features. International Journal of Digital Multimedia Broadcasting
Bai Y, Zhang Y, Ding M, Ghanem B (2018) Sod-mtgan: Small object detection via multi-task generative adversarial network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 206–221
Pal S K, Pramanik A, Maiti J, Mitra P (2021) Deep learning in multi-object detection and tracking: state of the art. Appl Intell:1–30
Tian G, Liu J, Zhao H, Yang W (2021) Small object detection via dual inspection mechanism for uav visual images. Appl Intell:1–14
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE access 7:128837–128868
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg A C (2017) Dssd: Deconvolutional single shot detector. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–11
Yuxi Li J L, Lin W (2018) Tiny-DSOD: Lightweight object detection for resource-restricted usage. In: BMVC
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2154
Bell S, Zitnick C L, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. CoRR, arXiv:1409.1556
Hosang J, Benenson R, Schiele B (2017) Learning non-maximum suppression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4507–4515
Adelson E H, Anderson C H, Bergen J R, Burt P J, Ogden J M (1984) Pyramid methods in image processing. RCA Eng 29(6): 33–41
Singh B, Davis L S (2018) An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3578–3587
Yang Y, Ramanan D (2012) Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal Mach Intell 35(12):2878–2890
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 886–893
Ding Y, Xiao J (2012) Contextual boost for pedestrian detection. In: 2012 IEEE Conference on computer vision and pattern recognition. IEEE, pp 2895–2902
Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Yang J, Wu B, Li L, Cao P, Zaiane O (2021) Msds-unet: A multi-scale deeply supervised 3d u-net for automatic segmentation of lung tumor in ct. Comput Med Imaging Graph:101957
Li X, Zhao L, Wei L, Yang M-H, Wu F, Zhuang Y, Ling H, Wang J (2016) Deepsaliency: Multi-task deep neural network model for salient object detection. IEEE Trans Image Process 25 (8):3919–3930
Sun C, Ai Y, Wang S, Zhang W (2021) Mask-guided ssd for small-object detection. Appl Intell 51(6):3311–3322
Wang G, Xiong Z, Liu D, Luo C (2018) Cascade mask generation framework for fast small object detection. In: 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6
Dong J, Chen Q, Yan S, Yuille A (2014) Towards unified object detection and semantic segmentation. In: European conference on computer vision. Springer, pp 299–314
Sistu G, Leang I, Yogamani S (2018) Real-time joint object detection and semantic segmentation network for automated driving. Adv Neural Inf Process Syst:1–5
Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Li Z, Zhou F (2017) Fssd: feature fusion single shot multibox detector. CoRR, arxIv:1712.00960
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille A L (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5813–5821
Wang H, Wang Q, Gao M, Li P, Zuo W (2018) Multi-scale location-aware kernel representation for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1248–1257
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PHS (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2805–2813
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Xie E, Sun P, Song X, Wang W, Liu X, Liang D, Shen C, Luo P (2020) Polarmask: Single shot instance segmentation with polar representation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12193–12202
Chen X, Girshick R, He K, Dollár P (2019) Tensormask: A foundation for dense object segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2061–2069
Wang S, Gong Y, Xing J, Huang L, Huang C, Hu W (2020) Rdsnet: A new deep architecture forreciprocal object detection and instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 12208–12215
Chen K, Lin W, Li J, See J, Wang J, Zou J (2020) Ap-loss for accurate one-stage object detection. IEEE Trans Pattern Anal Mach Intell 43(11):3782–3798
Acknowledgments
This research was supported by the National Natural Science Foundation of China (No.62076059) and the Science Project of Liaoning Province (2021-MS-105).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflicts of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wen, G., Cao, P., Wang, H. et al. MS-SSD: multi-scale single shot detector for ship detection in remote sensing images. Appl Intell 53, 1586–1604 (2023). https://doi.org/10.1007/s10489-022-03549-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03549-6