Detecting Small Objects in Thermal Images Using Single-Shot Detector

Hao Zhang; Hong, Xiang-gong; Zhu, Li

doi:10.3103/S0146411621020097

Detecting Small Objects in Thermal Images Using Single-Shot Detector

Published: 14 May 2021

Volume 55, pages 202–211, (2021)
Cite this article

Automatic Control and Computer Sciences Aims and scope Submit manuscript

Hao Zhang¹,
Xiang-gong Hong¹ &
Li Zhu¹

230 Accesses
10 Citations
2 Altmetric
Explore all metrics

Abstract

SSD (Single Shot Multibox Detector) is one of the most successful object detectors for its high accuracy and fast speed. However, the features from shallow layer (mainly Conv4_3) of SSD lack semantic information, resulting in poor performance in small objects. In this paper, we proposed DDSSD (Dilation and Deconvolution Single Shot Multibox Detector), an enhanced SSD with a novel feature fusion module which can improve the performance over SSD for small object detection. In the feature fusion module, dilation convolution module is utilized to enlarge the receptive field of features from shallow layer and deconvolution module is adopted to increase the size of feature maps from high layer. Our network achieves 79.7% mAP on PASCAL VOC2007 test and 28.3% mmAP on MS COCO test-dev at 41 FPS with only 300 × 300 input using a single Nvidia 1080 GPU. Especially, for small objects, DDSSD achieves 10.5% on MS COCO and 22.8% on FLIR thermal dataset, outperforming a lot of state-of-the-art object detection algorithms in both aspects of accuracy and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet

FasterNet-SSD: a small object detection method based on SSD model

Article 25 August 2023

CDSSD: Refreshing Single Shot Object Detection Using a Conv-Deconv Network

REFERENCES

FREE FLIR Thermal Dataset for Algorithm Training. https://www.flir.com/oem/adas/adas-dataset-form/.
Uijlings, J.R., Van De Sande, K.E., Gevers, T., and Smeulders, A.W. Selective search for object recognition, Int. J. Comput. Vision, 2013, vol. 104, pp. 154–171. https://doi.org/10.1007/s11263-013-0620-5
Article Google Scholar
Zitnick, C.L. and Dollar, P., Edge Boxes: Locating Object Proposals from Edges, Springer, 2014, pp. 391–405. https://doi.org/10.1007/978-3-319-10602-1_26
Book Google Scholar
Ren, S., He, K., Girshick, R., and Sun, J., Faster R-CNN: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems 28 (NIPS 2015), 2015.
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C., SSD: Single shot multibox detector, European Conference on Computer Vision, Springer, 2016, pp. 21–37.
Simonyan, K. and Zisserman, A., Very deep convolutional networks for large-scale image recognition, arXiv, 2014. arXiv:1409.1556
Xiang, W., Zhang, D.Q., Yu, H., and Athitsos, V., Context-aware single-shot detector, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2018, pp. 1784–1793.
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y., Overfeat: Integrated recognition, localization and detection using convolutional networks, arXiv, 2013. arXiv:1312.6229
Girshick, R., Donahue, J., Darrell, T., and Malik, J., Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.
He, K., Zhang, X., Ren, S., and Sun, J., Spatial pyramid pooling in deep convolutional networks for visual recognition, European Conference on Computer Vision, Springer, 2014, pp. 346–361.
Girshick, R., Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.
Dai, J., Li, Y., He, K., and Sun, J., R-FCN: Object detection via region-based fully convolutional networks, NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016, pp. 379–387.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A., You only look once: Unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.
Redmon, J. and Farhadi, A., YOLO9000: Better, faster, stronger, arXiv, 2017. arXiv:1612.08242 [cs.CV]
Bell, S., Lawrence Zitnick, C., Bala, K., and Girshick, R., Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2874–2883.
Kong, T., Yao, A., Chen, Y., and Sun, F., Hypernet: Towards accurate region proposal generation and joint object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 845–853.
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A.C., DSSD: Deconvolutional single shot detector, arXiv, 2017. arXiv:1701.06659
Li, Z. and Zhou, F., FSSD: Feature Fusion Single Shot Multibox Detector, arXiv, 2017. arXiv:1712.00960
Cao, G., Xie, X., Yang, W., Liao, Q., Shi, G., and Wu, J., Feature-fused SSD: Fast detection for small objects, Ninth International Conference on Graphic and Image Processing (ICGIP 2017), 2017.
Cui, L., MDSSD: Multi-scale deconvolutional single shot detector for small objects, arXiv, 2018. arXiv:1805.07009
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A., Object detectors emerge in deep scene CNNs, arXiv, 2014. arXiv:1412.6856
Yu, F. and Koltun, V., Multi-scale context aggregation by dilated convolutions, arXiv, 2015. arXiv:1511.07122
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L., DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell., 2018, vol. 40, pp. 834–848.
Article Google Scholar
Luo, W., Li, Y., Urtasun, R., and Zemel, R., Understanding the effective receptive field in deep convolutional neural networks, Advances in Neural Information Processing Systems 29 (NIPS 2016), 2016, pp. 4898–4906.
Google Scholar
Noh, H., Hong, S., and Han, B., Learning deconvolution network for semantic segmentation, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1520–1528.
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L., Imagenet: A large-scale hierarchical image database, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), IEEE, 2009, pp. 248–255.
Liu, S., Huang, D., et al., Receptive field block net for accurate and fast object detection, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 385–400.
Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., and Chen, Y., Ron: Reverse connection with objectness prior networks for object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5936–5944.
Shen, Z., Liu, Z., Li, J., Jiang, Y.G., Chen, Y., and Xue, X. Dsod: Learning deeply supervised object detectors from scratch, The IEEE International Conference on Computer Vision (ICCV), 2017, vol. 3, p. 7.
Zheng, L., Fu, C., and Zhao, Y., Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network, arXiv, 2018. arXiv:1801.05918
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C.L., Microsoft COCO: Common objects in context, European Conference on Computer Vision, 2014, pp. 740–755.

Download references

Funding

This work was partly supported by Innovation Fund for Graduate of Nanchang University under Grant CX2018145.

Author information

Authors and Affiliations

School of Information Engineering, Nanchang University, 330031, Nanchang, China
Hao Zhang, Xiang-gong Hong & Li Zhu

Authors

Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang-gong Hong
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang-gong Hong.

Ethics declarations

The authors declare that they have no conflicts of interest.

About this article

Cite this article

Hao Zhang, Hong, Xg. & Zhu, L. Detecting Small Objects in Thermal Images Using Single-Shot Detector. Aut. Control Comp. Sci. 55, 202–211 (2021). https://doi.org/10.3103/S0146411621020097

Download citation

Received: 15 May 2020
Revised: 28 September 2020
Accepted: 15 October 2020
Published: 14 May 2021
Issue Date: March 2021
DOI: https://doi.org/10.3103/S0146411621020097

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions