Skip to main content
Log in

Spatial-Transformer and Cross-Scale Fusion Network (STCS-Net) for Small Object Detection in Remote Sensing Images

  • Research Article
  • Published:
Journal of the Indian Society of Remote Sensing Aims and scope Submit manuscript

Abstract

With the rapid development of deep learning in recent years, the detection technology of objects in natural-scene images has been greatly improved. However, the scale of remote sensing image objects is diverse and many small objects exist in the images, which results in the low overall accuracy of remote sensing image object detection and omission of small objects. To address this issue, we propose a novel method for small object detection in optical remote sensing images. In this method, a spatial-transformer module is constructed using spatial attention and self-attention to realize feature extraction of key regions in the image space. Then, a cross-scale feature-fusion module is constructed in the neck region of the network to fuse similar feature information at different levels. Finally, the optimal bounding box is selected by reconstructing the bounding box screening rules and ignoring the interference of redundant bounding boxes. We experimentally evaluate our method on the remote sensing image dataset DIOR, DOTA and AI-TOD, and we compare it with other state-of-the-art object detection methods. In addition to improving the overall accuracy by 2%, 2.9% and 2.7% respectively, the proposed method can also considerably improve the small object detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Azimi, S. M., Vig, E., Bahmanyar, R., Körner, M., & Reinartz, P. (2018). Towards multi-class object detection in unconstrained remote sensing imagery. In Asian conference on computer vision (pp. 150–165). Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_10

  • Bodla, N., Singh, B., Chellappa, R., & Davis, L. S. (2017). Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE international conference on computer vision (pp. 5561–5569). https://doi.org/10.1109/ICCV.2017.593

  • Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6154–6162). https://doi.org/10.1109/CVPR.2018.00644

  • Chen, Z., Yang, C., Li, Q., Zhao, F., Zha, Z. J., & Wu, F. (2021). Disentangle your dense object detector. In Proceedings of the 29th ACM international conference on multimedia (pp. 4939–4948). https://doi.org/10.1145/3474085.3475351

  • Devi, N. B., Kavida, A. C., & Murugan, R. (2022). Feature extraction and object detection using fast-convolutional neural network for remote sensing satellite image. Journal of the Indian Society of Remote Sensing, 50, 961–973. https://doi.org/10.1007/s12524-022-01506-x

    Article  Google Scholar 

  • Fakhri, S. A., & Shah-Hosseini, R. (2022). Improved road detection algorithm based on fusion of deep convolutional neural networks and random forest classifier on VHR remotely-sensed images. Journal of the Indian Society of Remote Sensing, 50, 1409–1421. https://doi.org/10.1007/s12524-022-01532-9

    Article  Google Scholar 

  • Feng, C., Zhong, Y., Gao, Y., Scott, M. R., & Huang, W. (2021). Tood: Task-aligned one-stage object detection. In 2021 IEEE/CVF international conference on computer vision (ICCV) (pp. 3490–3499). IEEE Computer Society. https://doi.org/10.1109/ICCV48922.2021.00349

  • Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430. https://doi.org/10.48550/arXiv.2107.08430

  • Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., & Shi, J. (2020). Foveabox: Beyound anchor-based object detection. IEEE Transactions on Image Processing, 29, 7389–7398.

    Article  Google Scholar 

  • Li, K., Wan, G., Cheng, G., Meng, L., & Han, J. (2020). Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 159, 296–307. https://doi.org/10.1016/j.isprsjprs.2019.11.023

    Article  Google Scholar 

  • Li, Y., Chen, Y., Wang, N., & Zhang, Z. (2019). Scale-aware trident networks for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6054–6063). https://doi.org/10.1109/TIP.2020.3002345

  • Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125). https://doi.org/10.1109/CVPR.2017.106

  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In ’European conference on computer vision (pp. 21–37). Springer, Cham. https://doi.org/10.1007/978-3-319-46448-0_2

  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012–10022). https://doi.org/10.1109/ICCV48922.2021.00986

  • Micheal, A. A., Vani, K., Sanjeevi, S., & Lin, C. H. (2021). Object detection and tracking with UAV data using deep learning. Journal of the Indian Society of Remote Sensing, 49(3), 463–469. https://doi.org/10.1007/s12524-020-01229-x

    Article  Google Scholar 

  • Neubeck, A., & Van Gool, L. (2006). Effcient non-maximum suppression. In 18th international conference on pattern recognition (ICPR’06) (Vol. 3, pp. 850–855). IEEE. https://doi.org/10.1109/ICPR.2006.479

  • Qiao, S., Chen, L. C., & Yuille, A. (2021). Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10213–10224). https://doi.org/10.1109/CVPR46437.2021.01008

  • Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767. https://doi.org/10.48550/arXiv.1804.02767

  • Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards realtime object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.

  • Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., & Luo, P. (2021). Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14454–14463). https://doi.org/10.1109/CVPR46437.2021.01422

  • Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9627–9636). https://doi.org/10.1109/ICCV.2019.00972

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (Vol. 30). https://doi.org/10.5555/3295222.3295349

  • Wang, J., Yang, W., Guo, H., Zhang, R., & Xia, G. S. (2021). Tiny object detection in aerial images. In 2020 25th international conference on pattern recognition (ICPR) (pp. 3791–3798). IEEE. https://doi.org/10.1109/ICPR48806.2021.9413340

  • Wang, P., Sun, X., Diao, W., & Fu, K. (2019). FMSSD: Feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 58(5), 3377–3390. https://doi.org/10.1109/TGRS.2019.2954328

    Article  Google Scholar 

  • Xia, G. S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., & Zhang, L. (2018). DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3974–3983). https://doi.org/10.1109/CVPR.2018.00418

  • Xu, C., Wang, J., Yang, W., Yu, H., Yu, L., & Xia, G. S. (2022). RFLA: Gaussian receptive field based label assignment for tiny object detection. In Computer Vision-ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX (pp. 526–543). Springer Nature Switzerland, Cham. https://doi.org/10.1007/978-3-031-20077-9_31

  • Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., Sun, X., & Fu, K. (2019). Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8232–8241). https://doi.org/10.1109/ICCV.2019.00832

  • Yang, Z., Liu, S., Hu, H., Wang, L., & Lin, S. (2019). Reppoints: Point set representation for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9657–9666). https://doi.org/10.1109/ICCV.2019.00975

  • Yohanandan, S., Song, A., Dyer, A. G., & Tao, D. (2018). Saliency preservation in low-resolution grayscale images. In Proceedings of the European conference on computer vision (ECCV) (pp. 235–251). https://doi.org/10.1007/978-3-030-01231-1_15

  • Zhang, D., Han, J., Cheng, G., Liu, Z., Bu, S., & Guo, L. (2014). Weakly supervised learning for target detection in remote sensing images. IEEE Geoscience and Remote Sensing Letters, 12(4), 701–705. https://doi.org/10.1109/LGRS.2014.2358994

    Article  Google Scholar 

  • Zhang, S., Chi, C., Yao, Y., Lei, Z., & Li, S. Z. (2020). Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9759–9768). https://doi.org/10.1109/CVPR42600.2020.00978

  • Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2921–2929). https://doi.org/10.1109/CVPR.2016.319

  • Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850. https://doi.org/10.48550/arXiv.1904.07850

Download references

Acknowledgements

This research was funded by the Advance Research Program (09500094).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinhui Lan.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lan, J., Zhang, C., Lu, W. et al. Spatial-Transformer and Cross-Scale Fusion Network (STCS-Net) for Small Object Detection in Remote Sensing Images. J Indian Soc Remote Sens 51, 1427–1439 (2023). https://doi.org/10.1007/s12524-023-01709-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12524-023-01709-w

Keywords

Navigation