Spatial-Transformer and Cross-Scale Fusion Network (STCS-Net) for Small Object Detection in Remote Sensing Images

Lan, Jinhui; Zhang, Cheng; Lu, Weijian; Gu, Naiwei

doi:10.1007/s12524-023-01709-w

Spatial-Transformer and Cross-Scale Fusion Network (STCS-Net) for Small Object Detection in Remote Sensing Images

Research Article
Published: 02 June 2023

Volume 51, pages 1427–1439, (2023)
Cite this article

Journal of the Indian Society of Remote Sensing Aims and scope Submit manuscript

Jinhui Lan¹,
Cheng Zhang ORCID: orcid.org/0000-0002-6739-8530¹,
Weijian Lu² &
…
Naiwei Gu²

400 Accesses
2 Citations
Explore all metrics

Abstract

With the rapid development of deep learning in recent years, the detection technology of objects in natural-scene images has been greatly improved. However, the scale of remote sensing image objects is diverse and many small objects exist in the images, which results in the low overall accuracy of remote sensing image object detection and omission of small objects. To address this issue, we propose a novel method for small object detection in optical remote sensing images. In this method, a spatial-transformer module is constructed using spatial attention and self-attention to realize feature extraction of key regions in the image space. Then, a cross-scale feature-fusion module is constructed in the neck region of the network to fuse similar feature information at different levels. Finally, the optimal bounding box is selected by reconstructing the bounding box screening rules and ignoring the interference of redundant bounding boxes. We experimentally evaluate our method on the remote sensing image dataset DIOR, DOTA and AI-TOD, and we compare it with other state-of-the-art object detection methods. In addition to improving the overall accuracy by 2%, 2.9% and 2.7% respectively, the proposed method can also considerably improve the small object detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

References

Azimi, S. M., Vig, E., Bahmanyar, R., Körner, M., & Reinartz, P. (2018). Towards multi-class object detection in unconstrained remote sensing imagery. In Asian conference on computer vision (pp. 150–165). Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_10
Bodla, N., Singh, B., Chellappa, R., & Davis, L. S. (2017). Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE international conference on computer vision (pp. 5561–5569). https://doi.org/10.1109/ICCV.2017.593
Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6154–6162). https://doi.org/10.1109/CVPR.2018.00644
Chen, Z., Yang, C., Li, Q., Zhao, F., Zha, Z. J., & Wu, F. (2021). Disentangle your dense object detector. In Proceedings of the 29th ACM international conference on multimedia (pp. 4939–4948). https://doi.org/10.1145/3474085.3475351
Devi, N. B., Kavida, A. C., & Murugan, R. (2022). Feature extraction and object detection using fast-convolutional neural network for remote sensing satellite image. Journal of the Indian Society of Remote Sensing, 50, 961–973. https://doi.org/10.1007/s12524-022-01506-x
Article Google Scholar
Fakhri, S. A., & Shah-Hosseini, R. (2022). Improved road detection algorithm based on fusion of deep convolutional neural networks and random forest classifier on VHR remotely-sensed images. Journal of the Indian Society of Remote Sensing, 50, 1409–1421. https://doi.org/10.1007/s12524-022-01532-9
Article Google Scholar
Feng, C., Zhong, Y., Gao, Y., Scott, M. R., & Huang, W. (2021). Tood: Task-aligned one-stage object detection. In 2021 IEEE/CVF international conference on computer vision (ICCV) (pp. 3490–3499). IEEE Computer Society. https://doi.org/10.1109/ICCV48922.2021.00349
Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430. https://doi.org/10.48550/arXiv.2107.08430
Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., & Shi, J. (2020). Foveabox: Beyound anchor-based object detection. IEEE Transactions on Image Processing, 29, 7389–7398.
Article Google Scholar
Li, K., Wan, G., Cheng, G., Meng, L., & Han, J. (2020). Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 159, 296–307. https://doi.org/10.1016/j.isprsjprs.2019.11.023
Article Google Scholar
Li, Y., Chen, Y., Wang, N., & Zhang, Z. (2019). Scale-aware trident networks for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6054–6063). https://doi.org/10.1109/TIP.2020.3002345
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125). https://doi.org/10.1109/CVPR.2017.106
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In ’European conference on computer vision (pp. 21–37). Springer, Cham. https://doi.org/10.1007/978-3-319-46448-0_2
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012–10022). https://doi.org/10.1109/ICCV48922.2021.00986
Micheal, A. A., Vani, K., Sanjeevi, S., & Lin, C. H. (2021). Object detection and tracking with UAV data using deep learning. Journal of the Indian Society of Remote Sensing, 49(3), 463–469. https://doi.org/10.1007/s12524-020-01229-x
Article Google Scholar
Neubeck, A., & Van Gool, L. (2006). Effcient non-maximum suppression. In 18th international conference on pattern recognition (ICPR’06) (Vol. 3, pp. 850–855). IEEE. https://doi.org/10.1109/ICPR.2006.479
Qiao, S., Chen, L. C., & Yuille, A. (2021). Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10213–10224). https://doi.org/10.1109/CVPR46437.2021.01008
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767. https://doi.org/10.48550/arXiv.1804.02767
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards realtime object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.
Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., & Luo, P. (2021). Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14454–14463). https://doi.org/10.1109/CVPR46437.2021.01422
Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9627–9636). https://doi.org/10.1109/ICCV.2019.00972
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (Vol. 30). https://doi.org/10.5555/3295222.3295349
Wang, J., Yang, W., Guo, H., Zhang, R., & Xia, G. S. (2021). Tiny object detection in aerial images. In 2020 25th international conference on pattern recognition (ICPR) (pp. 3791–3798). IEEE. https://doi.org/10.1109/ICPR48806.2021.9413340
Wang, P., Sun, X., Diao, W., & Fu, K. (2019). FMSSD: Feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 58(5), 3377–3390. https://doi.org/10.1109/TGRS.2019.2954328
Article Google Scholar
Xia, G. S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., & Zhang, L. (2018). DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3974–3983). https://doi.org/10.1109/CVPR.2018.00418
Xu, C., Wang, J., Yang, W., Yu, H., Yu, L., & Xia, G. S. (2022). RFLA: Gaussian receptive field based label assignment for tiny object detection. In Computer Vision-ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX (pp. 526–543). Springer Nature Switzerland, Cham. https://doi.org/10.1007/978-3-031-20077-9_31
Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., Sun, X., & Fu, K. (2019). Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8232–8241). https://doi.org/10.1109/ICCV.2019.00832
Yang, Z., Liu, S., Hu, H., Wang, L., & Lin, S. (2019). Reppoints: Point set representation for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9657–9666). https://doi.org/10.1109/ICCV.2019.00975
Yohanandan, S., Song, A., Dyer, A. G., & Tao, D. (2018). Saliency preservation in low-resolution grayscale images. In Proceedings of the European conference on computer vision (ECCV) (pp. 235–251). https://doi.org/10.1007/978-3-030-01231-1_15
Zhang, D., Han, J., Cheng, G., Liu, Z., Bu, S., & Guo, L. (2014). Weakly supervised learning for target detection in remote sensing images. IEEE Geoscience and Remote Sensing Letters, 12(4), 701–705. https://doi.org/10.1109/LGRS.2014.2358994
Article Google Scholar
Zhang, S., Chi, C., Yao, Y., Lei, Z., & Li, S. Z. (2020). Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9759–9768). https://doi.org/10.1109/CVPR42600.2020.00978
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2921–2929). https://doi.org/10.1109/CVPR.2016.319
Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850. https://doi.org/10.48550/arXiv.1904.07850

Download references

Acknowledgements

This research was funded by the Advance Research Program (09500094).

Author information

Cheng Zhang and Jinhui Lan are equally contributed.

Authors and Affiliations

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Xueyuanlu, Beijing, 100083, People’s Republic of China
Jinhui Lan & Cheng Zhang
Beijing Institute of Space Launch Technology, Beijing, People’s Republic of China
Weijian Lu & Naiwei Gu

Authors

Jinhui Lan
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weijian Lu
View author publications
You can also search for this author in PubMed Google Scholar
Naiwei Gu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinhui Lan.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Cite this article

Lan, J., Zhang, C., Lu, W. et al. Spatial-Transformer and Cross-Scale Fusion Network (STCS-Net) for Small Object Detection in Remote Sensing Images. J Indian Soc Remote Sens 51, 1427–1439 (2023). https://doi.org/10.1007/s12524-023-01709-w

Download citation

Received: 22 December 2022
Accepted: 18 April 2023
Published: 02 June 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s12524-023-01709-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial-Transformer and Cross-Scale Fusion Network (STCS-Net) for Small Object Detection in Remote Sensing Images

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

About this article

Cite this article

Keywords

Navigation

Spatial-Transformer and Cross-Scale Fusion Network (STCS-Net) for Small Object Detection in Remote Sensing Images

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

About this article

Cite this article

Share this article

Keywords

Search

Navigation