Abstract
With the rapid development of deep learning in recent years, the detection technology of objects in natural-scene images has been greatly improved. However, the scale of remote sensing image objects is diverse and many small objects exist in the images, which results in the low overall accuracy of remote sensing image object detection and omission of small objects. To address this issue, we propose a novel method for small object detection in optical remote sensing images. In this method, a spatial-transformer module is constructed using spatial attention and self-attention to realize feature extraction of key regions in the image space. Then, a cross-scale feature-fusion module is constructed in the neck region of the network to fuse similar feature information at different levels. Finally, the optimal bounding box is selected by reconstructing the bounding box screening rules and ignoring the interference of redundant bounding boxes. We experimentally evaluate our method on the remote sensing image dataset DIOR, DOTA and AI-TOD, and we compare it with other state-of-the-art object detection methods. In addition to improving the overall accuracy by 2%, 2.9% and 2.7% respectively, the proposed method can also considerably improve the small object detection accuracy.
Similar content being viewed by others
References
Azimi, S. M., Vig, E., Bahmanyar, R., Körner, M., & Reinartz, P. (2018). Towards multi-class object detection in unconstrained remote sensing imagery. In Asian conference on computer vision (pp. 150–165). Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_10
Bodla, N., Singh, B., Chellappa, R., & Davis, L. S. (2017). Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE international conference on computer vision (pp. 5561–5569). https://doi.org/10.1109/ICCV.2017.593
Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6154–6162). https://doi.org/10.1109/CVPR.2018.00644
Chen, Z., Yang, C., Li, Q., Zhao, F., Zha, Z. J., & Wu, F. (2021). Disentangle your dense object detector. In Proceedings of the 29th ACM international conference on multimedia (pp. 4939–4948). https://doi.org/10.1145/3474085.3475351
Devi, N. B., Kavida, A. C., & Murugan, R. (2022). Feature extraction and object detection using fast-convolutional neural network for remote sensing satellite image. Journal of the Indian Society of Remote Sensing, 50, 961–973. https://doi.org/10.1007/s12524-022-01506-x
Fakhri, S. A., & Shah-Hosseini, R. (2022). Improved road detection algorithm based on fusion of deep convolutional neural networks and random forest classifier on VHR remotely-sensed images. Journal of the Indian Society of Remote Sensing, 50, 1409–1421. https://doi.org/10.1007/s12524-022-01532-9
Feng, C., Zhong, Y., Gao, Y., Scott, M. R., & Huang, W. (2021). Tood: Task-aligned one-stage object detection. In 2021 IEEE/CVF international conference on computer vision (ICCV) (pp. 3490–3499). IEEE Computer Society. https://doi.org/10.1109/ICCV48922.2021.00349
Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430. https://doi.org/10.48550/arXiv.2107.08430
Kong, T., Sun, F., Liu, H., Jiang, Y., Li, L., & Shi, J. (2020). Foveabox: Beyound anchor-based object detection. IEEE Transactions on Image Processing, 29, 7389–7398.
Li, K., Wan, G., Cheng, G., Meng, L., & Han, J. (2020). Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 159, 296–307. https://doi.org/10.1016/j.isprsjprs.2019.11.023
Li, Y., Chen, Y., Wang, N., & Zhang, Z. (2019). Scale-aware trident networks for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6054–6063). https://doi.org/10.1109/TIP.2020.3002345
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125). https://doi.org/10.1109/CVPR.2017.106
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In ’European conference on computer vision (pp. 21–37). Springer, Cham. https://doi.org/10.1007/978-3-319-46448-0_2
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012–10022). https://doi.org/10.1109/ICCV48922.2021.00986
Micheal, A. A., Vani, K., Sanjeevi, S., & Lin, C. H. (2021). Object detection and tracking with UAV data using deep learning. Journal of the Indian Society of Remote Sensing, 49(3), 463–469. https://doi.org/10.1007/s12524-020-01229-x
Neubeck, A., & Van Gool, L. (2006). Effcient non-maximum suppression. In 18th international conference on pattern recognition (ICPR’06) (Vol. 3, pp. 850–855). IEEE. https://doi.org/10.1109/ICPR.2006.479
Qiao, S., Chen, L. C., & Yuille, A. (2021). Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10213–10224). https://doi.org/10.1109/CVPR46437.2021.01008
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767. https://doi.org/10.48550/arXiv.1804.02767
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards realtime object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031.
Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., & Luo, P. (2021). Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14454–14463). https://doi.org/10.1109/CVPR46437.2021.01422
Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9627–9636). https://doi.org/10.1109/ICCV.2019.00972
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (Vol. 30). https://doi.org/10.5555/3295222.3295349
Wang, J., Yang, W., Guo, H., Zhang, R., & Xia, G. S. (2021). Tiny object detection in aerial images. In 2020 25th international conference on pattern recognition (ICPR) (pp. 3791–3798). IEEE. https://doi.org/10.1109/ICPR48806.2021.9413340
Wang, P., Sun, X., Diao, W., & Fu, K. (2019). FMSSD: Feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 58(5), 3377–3390. https://doi.org/10.1109/TGRS.2019.2954328
Xia, G. S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., & Zhang, L. (2018). DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3974–3983). https://doi.org/10.1109/CVPR.2018.00418
Xu, C., Wang, J., Yang, W., Yu, H., Yu, L., & Xia, G. S. (2022). RFLA: Gaussian receptive field based label assignment for tiny object detection. In Computer Vision-ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX (pp. 526–543). Springer Nature Switzerland, Cham. https://doi.org/10.1007/978-3-031-20077-9_31
Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., Sun, X., & Fu, K. (2019). Scrdet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8232–8241). https://doi.org/10.1109/ICCV.2019.00832
Yang, Z., Liu, S., Hu, H., Wang, L., & Lin, S. (2019). Reppoints: Point set representation for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9657–9666). https://doi.org/10.1109/ICCV.2019.00975
Yohanandan, S., Song, A., Dyer, A. G., & Tao, D. (2018). Saliency preservation in low-resolution grayscale images. In Proceedings of the European conference on computer vision (ECCV) (pp. 235–251). https://doi.org/10.1007/978-3-030-01231-1_15
Zhang, D., Han, J., Cheng, G., Liu, Z., Bu, S., & Guo, L. (2014). Weakly supervised learning for target detection in remote sensing images. IEEE Geoscience and Remote Sensing Letters, 12(4), 701–705. https://doi.org/10.1109/LGRS.2014.2358994
Zhang, S., Chi, C., Yao, Y., Lei, Z., & Li, S. Z. (2020). Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9759–9768). https://doi.org/10.1109/CVPR42600.2020.00978
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2921–2929). https://doi.org/10.1109/CVPR.2016.319
Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850. https://doi.org/10.48550/arXiv.1904.07850
Acknowledgements
This research was funded by the Advance Research Program (09500094).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that they have no conflict of interest
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lan, J., Zhang, C., Lu, W. et al. Spatial-Transformer and Cross-Scale Fusion Network (STCS-Net) for Small Object Detection in Remote Sensing Images. J Indian Soc Remote Sens 51, 1427–1439 (2023). https://doi.org/10.1007/s12524-023-01709-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12524-023-01709-w