Abstract
SSD (Single Shot Multibox Detector) is one of the most successful object detectors for its high accuracy and fast speed. However, the features from shallow layer (mainly Conv4_3) of SSD lack semantic information, resulting in poor performance in small objects. In this paper, we proposed DDSSD (Dilation and Deconvolution Single Shot Multibox Detector), an enhanced SSD with a novel feature fusion module which can improve the performance over SSD for small object detection. In the feature fusion module, dilation convolution module is utilized to enlarge the receptive field of features from shallow layer and deconvolution module is adopted to increase the size of feature maps from high layer. Our network achieves 79.7% mAP on PASCAL VOC2007 test and 28.3% mmAP on MS COCO test-dev at 41 FPS with only 300 × 300 input using a single Nvidia 1080 GPU. Especially, for small objects, DDSSD achieves 10.5% on MS COCO and 22.8% on FLIR thermal dataset, outperforming a lot of state-of-the-art object detection algorithms in both aspects of accuracy and speed.
Similar content being viewed by others
REFERENCES
FREE FLIR Thermal Dataset for Algorithm Training. https://www.flir.com/oem/adas/adas-dataset-form/.
Uijlings, J.R., Van De Sande, K.E., Gevers, T., and Smeulders, A.W. Selective search for object recognition, Int. J. Comput. Vision, 2013, vol. 104, pp. 154–171. https://doi.org/10.1007/s11263-013-0620-5
Zitnick, C.L. and Dollar, P., Edge Boxes: Locating Object Proposals from Edges, Springer, 2014, pp. 391–405. https://doi.org/10.1007/978-3-319-10602-1_26
Ren, S., He, K., Girshick, R., and Sun, J., Faster R-CNN: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems 28 (NIPS 2015), 2015.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C., SSD: Single shot multibox detector, European Conference on Computer Vision, Springer, 2016, pp. 21–37.
Simonyan, K. and Zisserman, A., Very deep convolutional networks for large-scale image recognition, arXiv, 2014. arXiv:1409.1556
Xiang, W., Zhang, D.Q., Yu, H., and Athitsos, V., Context-aware single-shot detector, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2018, pp. 1784–1793.
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y., Overfeat: Integrated recognition, localization and detection using convolutional networks, arXiv, 2013. arXiv:1312.6229
Girshick, R., Donahue, J., Darrell, T., and Malik, J., Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.
He, K., Zhang, X., Ren, S., and Sun, J., Spatial pyramid pooling in deep convolutional networks for visual recognition, European Conference on Computer Vision, Springer, 2014, pp. 346–361.
Girshick, R., Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.
Dai, J., Li, Y., He, K., and Sun, J., R-FCN: Object detection via region-based fully convolutional networks, NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016, pp. 379–387.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A., You only look once: Unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.
Redmon, J. and Farhadi, A., YOLO9000: Better, faster, stronger, arXiv, 2017. arXiv:1612.08242 [cs.CV]
Bell, S., Lawrence Zitnick, C., Bala, K., and Girshick, R., Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2874–2883.
Kong, T., Yao, A., Chen, Y., and Sun, F., Hypernet: Towards accurate region proposal generation and joint object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 845–853.
Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A.C., DSSD: Deconvolutional single shot detector, arXiv, 2017. arXiv:1701.06659
Li, Z. and Zhou, F., FSSD: Feature Fusion Single Shot Multibox Detector, arXiv, 2017. arXiv:1712.00960
Cao, G., Xie, X., Yang, W., Liao, Q., Shi, G., and Wu, J., Feature-fused SSD: Fast detection for small objects, Ninth International Conference on Graphic and Image Processing (ICGIP 2017), 2017.
Cui, L., MDSSD: Multi-scale deconvolutional single shot detector for small objects, arXiv, 2018. arXiv:1805.07009
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A., Object detectors emerge in deep scene CNNs, arXiv, 2014. arXiv:1412.6856
Yu, F. and Koltun, V., Multi-scale context aggregation by dilated convolutions, arXiv, 2015. arXiv:1511.07122
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L., DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell., 2018, vol. 40, pp. 834–848.
Luo, W., Li, Y., Urtasun, R., and Zemel, R., Understanding the effective receptive field in deep convolutional neural networks, Advances in Neural Information Processing Systems 29 (NIPS 2016), 2016, pp. 4898–4906.
Noh, H., Hong, S., and Han, B., Learning deconvolution network for semantic segmentation, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1520–1528.
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L., Imagenet: A large-scale hierarchical image database, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), IEEE, 2009, pp. 248–255.
Liu, S., Huang, D., et al., Receptive field block net for accurate and fast object detection, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 385–400.
Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., and Chen, Y., Ron: Reverse connection with objectness prior networks for object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5936–5944.
Shen, Z., Liu, Z., Li, J., Jiang, Y.G., Chen, Y., and Xue, X. Dsod: Learning deeply supervised object detectors from scratch, The IEEE International Conference on Computer Vision (ICCV), 2017, vol. 3, p. 7.
Zheng, L., Fu, C., and Zhao, Y., Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network, arXiv, 2018. arXiv:1801.05918
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C.L., Microsoft COCO: Common objects in context, European Conference on Computer Vision, 2014, pp. 740–755.
Funding
This work was partly supported by Innovation Fund for Graduate of Nanchang University under Grant CX2018145.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no conflicts of interest.
About this article
Cite this article
Hao Zhang, Hong, Xg. & Zhu, L. Detecting Small Objects in Thermal Images Using Single-Shot Detector. Aut. Control Comp. Sci. 55, 202–211 (2021). https://doi.org/10.3103/S0146411621020097
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0146411621020097