S3OD: Single Stage Small Object Detector from Scratch for Remote Sensing Images

  • Feng YangEmail author
  • Wentong Li
  • Wanyi Li
  • Peng Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11903)


Small object detection is an important but challenge computer vision task in both natural scene and remote sensing scene. Due to the large difference of density, low contrast, sparse texture and arbitrary orientations, many advanced algorithms for small object detection in natural scene usually experience a sharp performance drop when directly applied to remote sensing images. In addition, most of state-of-the-art object detectors are fine-tuned from the off-the-shelf networks pretrained on large-scale classification dataset like ImageNet, which can incur learning bias and inconvenience of modification for remote sensing object detection tasks. In order to tackle these problems, a robust Single Stage Small Object Detector (S3OD) is trained from scratch, which can efficiently detect small-dense and small-dispersed objects in remote sensing images. The proposed S3OD adopts the small down-sampling factor to keep accurate location information and maintains high spatial resolution by introducing a new dilated residual block in deeper layers for small objects. Especially, the two-branch dilated feature attention module is proposed to enlarge the valid receptive field and make effective attention feature map for small-dense and small-dispersed object detection. S3OD can be trained from scratch stably while keeping the comparable performance by employing BatchNorm on both the backbone and detection head subnetworks. Experiments conducted on our built Remoting Sensing Small Object (RSSO) dataset shows that, our S3OD achieves the state-of-the-art accuracy for small objects detection and even performs better than several one-stage pretrained method.


Object detection Remote sensing images Small objects Convolutional neural networks 



The work was supported by National Natural Science Foundation of China (No. 91748131, No. 61771471, No. 61374159), the Youth Innovation Promotion Association Chinese Academy of Sciences (No. 2015112), the Foundation of CETC Key Laboratory of Data Link Technology (CLDL-20182316, CLDL-20182203), Natural Science Foundation of Shaanxi province (No. 2018MJ6048), and the Seed Foundation of Innovation and Creation for Graduate Students in Northwestern Polytechnical University (No. ZZ2019178).


  1. 1.
    Wu, Y., Zhang, R., Li, Y.: The detection of built-up areas in high-resolution SAR images based on deep neural networks. In: Zhao, Y., Kong, X., Taubman, D. (eds.) ICIG 2017. LNCS, vol. 10668, pp. 646–655. Springer, Cham (2017). Scholar
  2. 2.
    Zhang, W., Wang, S., Thachan, S., Chen, J., Qian, Y.: Deconv R-CNN for small object detection on remote sensing images. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 2483–2486. IEEE, Valencia (2018)Google Scholar
  3. 3.
    Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. IEEE, Honolulu (2017)Google Scholar
  4. 4.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  5. 5.
    Ren, S., He, K., Grishick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS), Canada, pp. 91–99 (2015)Google Scholar
  6. 6.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 379–387 (2016)Google Scholar
  7. 7.
    Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944. IEEE, Honolulu (2017)Google Scholar
  8. 8.
    Lin, T.Y., Goyal, P, Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007. IEEE, Venice (2017)Google Scholar
  9. 9.
    Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv Preprint. arXiv:1804.02767 (2018)
  10. 10.
    Li, Y.H., Chen, Y.T., Wang, N.Y., Zhang, Z.X.: Scale-aware trident networks for object detection. arXiv Preprint. arXiv:1901.01892 (2019)
  11. 11.
    Pang, J., Li, C., Shi, J. Xu, Z., Feng, H.: R2-CNN: fast tiny object detection in large-scale remote sensing images. arXiv Preprint. arXiv:1902.06042 (2019)
  12. 12.
    Van Etten, A.: Satellite imagery multiscale rapid detection with windowed networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 735–743. IEEE, Hilton Waikoloa Village (2019)Google Scholar
  13. 13.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv Preprint. arXiv:1409.1556 (2014)
  14. 14.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for images recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (ICCV), pp. 770–778. IEEE, Amsterdam (2016)Google Scholar
  15. 15.
    Shen, Z., Liu, Z., Li, J., Jiang, Y.G., Chen, Y., Xue X.: DSOD: learning deeply supervised object detectors from scratch. In: IEEE International Conference on Computer Vision (ICCV), pp. 1937–1945. IEEE, Venice (2017)Google Scholar
  16. 16.
    Huang, G., Liu, Z.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. IEEE, Honolulu (2017)Google Scholar
  17. 17.
    Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: DetNet: a backbone network for object detection. arXiv Preprint. arXiv:1804.06215 (2018)
  18. 18.
    Zhu, R., et al.: ScratchDet: training single-shot object detectors from scratch. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), accepted. IEEE, Long Beach (2019)Google Scholar
  19. 19.
    Santurkar, S., Tsipras. D., Ilyas, A., Madry, A.: How does batch normalization help optimization? In: Conference on Neural Information Processing Systems (NeurIPS), Montréal, pp. 2483–2493 (2018)Google Scholar
  20. 20.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141. IEEE, Utah (2018)Google Scholar
  21. 21.
    Cheng, G., Han, J., Zhou, P., Guo, L.: Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogrammetry Remote Sens. 98(98), 119–132 (2014)CrossRefGoogle Scholar
  22. 22.
  23. 23.
    Everingham, M., Eslami, S.M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Northwestern Polytechnical UniversityXi’anPeople’s Republic of China
  2. 2.Key Laboratory of Information Fusion TechnologyMinistry of EducationXi’anPeople’s Republic of China
  3. 3.Institute of Automation, Chinese Academy of SciencesBeijingPeople’s Republic of China
  4. 4.CETC Key Laboratory of Data Link TechnologyXi’anPeople’s Republic of China

Personalised recommendations