Skip to main content
Log in

Detecting Small Objects in Thermal Images Using Single-Shot Detector

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

SSD (Single Shot Multibox Detector) is one of the most successful object detectors for its high accuracy and fast speed. However, the features from shallow layer (mainly Conv4_3) of SSD lack semantic information, resulting in poor performance in small objects. In this paper, we proposed DDSSD (Dilation and Deconvolution Single Shot Multibox Detector), an enhanced SSD with a novel feature fusion module which can improve the performance over SSD for small object detection. In the feature fusion module, dilation convolution module is utilized to enlarge the receptive field of features from shallow layer and deconvolution module is adopted to increase the size of feature maps from high layer. Our network achieves 79.7% mAP on PASCAL VOC2007 test and 28.3% mmAP on MS COCO test-dev at 41 FPS with only 300 × 300 input using a single Nvidia 1080 GPU. Especially, for small objects, DDSSD achieves 10.5% on MS COCO and 22.8% on FLIR thermal dataset, outperforming a lot of state-of-the-art object detection algorithms in both aspects of accuracy and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Similar content being viewed by others

REFERENCES

  1. FREE FLIR Thermal Dataset for Algorithm Training. https://www.flir.com/oem/adas/adas-dataset-form/.

  2. Uijlings, J.R., Van De Sande, K.E., Gevers, T., and Smeulders, A.W. Selective search for object recognition, Int. J. Comput. Vision, 2013, vol. 104, pp. 154–171. https://doi.org/10.1007/s11263-013-0620-5

    Article  Google Scholar 

  3. Zitnick, C.L. and Dollar, P., Edge Boxes: Locating Object Proposals from Edges, Springer, 2014, pp. 391–405. https://doi.org/10.1007/978-3-319-10602-1_26

    Book  Google Scholar 

  4. Ren, S., He, K., Girshick, R., and Sun, J., Faster R-CNN: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems 28 (NIPS 2015), 2015.

    Google Scholar 

  5. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C., SSD: Single shot multibox detector, European Conference on Computer Vision, Springer, 2016, pp. 21–37.

  6. Simonyan, K. and Zisserman, A., Very deep convolutional networks for large-scale image recognition, arXiv, 2014. arXiv:1409.1556

  7. Xiang, W., Zhang, D.Q., Yu, H., and Athitsos, V., Context-aware single-shot detector, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2018, pp. 1784–1793.

  8. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y., Overfeat: Integrated recognition, localization and detection using convolutional networks, arXiv, 2013. arXiv:1312.6229

  9. Girshick, R., Donahue, J., Darrell, T., and Malik, J., Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.

  10. He, K., Zhang, X., Ren, S., and Sun, J., Spatial pyramid pooling in deep convolutional networks for visual recognition, European Conference on Computer Vision, Springer, 2014, pp. 346–361.

  11. Girshick, R., Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.

  12. Dai, J., Li, Y., He, K., and Sun, J., R-FCN: Object detection via region-based fully convolutional networks, NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016, pp. 379–387.

  13. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A., You only look once: Unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.

  14. Redmon, J. and Farhadi, A., YOLO9000: Better, faster, stronger, arXiv, 2017. arXiv:1612.08242 [cs.CV]

  15. Bell, S., Lawrence Zitnick, C., Bala, K., and Girshick, R., Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2874–2883.

  16. Kong, T., Yao, A., Chen, Y., and Sun, F., Hypernet: Towards accurate region proposal generation and joint object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 845–853.

  17. Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., and Berg, A.C., DSSD: Deconvolutional single shot detector, arXiv, 2017. arXiv:1701.06659

  18. Li, Z. and Zhou, F., FSSD: Feature Fusion Single Shot Multibox Detector, arXiv, 2017. arXiv:1712.00960

  19. Cao, G., Xie, X., Yang, W., Liao, Q., Shi, G., and Wu, J., Feature-fused SSD: Fast detection for small objects, Ninth International Conference on Graphic and Image Processing (ICGIP 2017), 2017.

  20. Cui, L., MDSSD: Multi-scale deconvolutional single shot detector for small objects, arXiv, 2018. arXiv:1805.07009

  21. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A., Object detectors emerge in deep scene CNNs, arXiv, 2014. arXiv:1412.6856

  22. Yu, F. and Koltun, V., Multi-scale context aggregation by dilated convolutions, arXiv, 2015. arXiv:1511.07122

  23. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L., DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell., 2018, vol. 40, pp. 834–848.

    Article  Google Scholar 

  24. Luo, W., Li, Y., Urtasun, R., and Zemel, R., Understanding the effective receptive field in deep convolutional neural networks, Advances in Neural Information Processing Systems 29 (NIPS 2016), 2016, pp. 4898–4906.

    Google Scholar 

  25. Noh, H., Hong, S., and Han, B., Learning deconvolution network for semantic segmentation, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1520–1528.

  26. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L., Imagenet: A large-scale hierarchical image database, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), IEEE, 2009, pp. 248–255.

  27. Liu, S., Huang, D., et al., Receptive field block net for accurate and fast object detection, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 385–400.

  28. Kong, T., Sun, F., Yao, A., Liu, H., Lu, M., and Chen, Y., Ron: Reverse connection with objectness prior networks for object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5936–5944.

  29. Shen, Z., Liu, Z., Li, J., Jiang, Y.G., Chen, Y., and Xue, X. Dsod: Learning deeply supervised object detectors from scratch, The IEEE International Conference on Computer Vision (ICCV), 2017, vol. 3, p. 7.

  30. Zheng, L., Fu, C., and Zhao, Y., Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network, arXiv, 2018. arXiv:1801.05918

  31. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C.L., Microsoft COCO: Common objects in context, European Conference on Computer Vision, 2014, pp. 740–755.

Download references

Funding

This work was partly supported by Innovation Fund for Graduate of Nanchang University under Grant CX2018145.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang-gong Hong.

Ethics declarations

The authors declare that they have no conflicts of interest.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hao Zhang, Hong, Xg. & Zhu, L. Detecting Small Objects in Thermal Images Using Single-Shot Detector. Aut. Control Comp. Sci. 55, 202–211 (2021). https://doi.org/10.3103/S0146411621020097

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411621020097

Keywords:

Navigation