Spatial attention model based target detection for aerial robotic systems

  • Meng Zhang
  • Shicheng Wang
  • Dongfang YangEmail author
  • Yongfei Li
  • Hao He
Regular Paper


Detecting interested targets on aerial robotic systems is a challenging task. Due to the long view distance of air-to-ground observation, the target size is small and the number is large in the scene. In addition, the target only occupies part of the image, and the complex background environment can easily cover the feature information of the target. In this paper, a novel target detection method based on spatial attention model is designed, which changes the existing methods to enhance the features of target areas by enhancing global semantic information. By learning the feature weights of different spatial locations in feature space, the method proposed can focus attention on the target regions of interest in an image, and suppress the background interference features, which enhances the feature information of the target regions, and deals with the class imbalance problem in detection. The experimental results show that the algorithm improves the detection accuracy of small air-to-ground targets and has a good detection effect for dense target areas. Compared with RefineDet, the state-of-art small target detector, our method can achieve better performance at a lower cost.


Spatial attention model Aerial robotic systems Small target detection Dense targets detection Deep learning 



This paper is supported by the National Natural Science Foundation of China (Grant nos. 61673017, 61403398), and the Natural Science Foundation of Shanxi Province (Grant nos. 2017JM6077, 2018ZDXM-GY-039).


  1. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer vision—ECCV 2006. Springer, Berlin, Heidelberg, pp. 404–417 (2006)CrossRefGoogle Scholar
  2. Cao, Y., Chen, K., et al.: Prime sample attention in object detection (2019). arXiv preprint arXiv:1904.04821
  3. Chen, L.C., Yang, Y., Wang, J., et al.: Attention to scale: scale-aware semantic image segmentation (2015). arXiv preprint arXiv:1511.03339
  4. Chu, W., Cai, D.: Deep feature based contextual model for object detection. Neurocomputing 275, 1035–1042 (2016)CrossRefGoogle Scholar
  5. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387, 1, 3, 6, 7, 8 (2016)Google Scholar
  6. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, pp. 886–893 (2005)Google Scholar
  7. Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision. IEEE Computer Society, pp. 1440–1448 (2015)Google Scholar
  8. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, 2014. 1, 3, 4, 8 (2014)Google Scholar
  9. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, pp. 770–778 (2016)Google Scholar
  10. Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. CoRR:abs/1704.0486 (2017)Google Scholar
  11. Huang, G., Liu, Z., Laurens, V.D.M., et al.: Densely connected convolutional networks (2016). arXiv preprint arXiv:1608.06993v5
  12. Kaiming, H., Georgia, G., Piotr, D., et al.: Mask R-CNN. In: ICCV (2017)Google Scholar
  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems. Curran Associates Inc., pp. 1097–1105 (2012)Google Scholar
  14. Li, W., Liu, G.: A single-shot object detector with feature aggregation and enhancement (2019). arXiv preprint arXiv:1902.02923
  15. Li, J., Liang, X., Li, J., et al.: Multi-stage object detection with group recursive learning (2016). arXiv preprint arXiv:1608.05159
  16. Li, J., Wei, Y., Liang, X., et al.: Attentive contexts for object detection. IEEE Trans. Multimedia 19(5), 944–954 (2017)CrossRefGoogle Scholar
  17. Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, 1, 3, 7, 8 (2017)Google Scholar
  18. Lindeberg, T.: Scale invariant feature transform. Scholarpedia. pp. 2012–2021 (2012)CrossRefGoogle Scholar
  19. Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot MultiBox detector. In: Computer Vision—ECCV 2016. Springer International Publishing, pp. 21–37 (2016)Google Scholar
  20. Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection (2017). arXiv preprint arXiv:1711.07767
  21. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  22. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition. IEEE, pp. 779–788 (2016)Google Scholar
  23. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International conference on neural information processing systems, MIT Press, pp. 91–99 (2015)Google Scholar
  24. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2014). arXiv:1409.1556
  25. Wang, X., Cai, Z., et al.: Towards universal object detection by domain attention (2019). arXiv preprint arXiv:1904.04402
  26. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. CVPR 2, 3 (2017)Google Scholar
  27. Xiang, W., Zhang, D.Q., Yu, H., et al.: Context-aware single-shot detector (2017). arXiv preprint arXiv:1707.08682
  28. Zeng, X., Ouyang, W., Yan, J., et al.: Crafting GBD-Net for object detection. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 2109–2123 (2016)Google Scholar
  29. Zhang, S., Wen, L., Bian, X., et al.: Single-shot refinement neural network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018 (2018a)Google Scholar
  30. Zhang, X., Wang, T., Lu, H., Wang, G.: Progressive attention guided recurrent network for salient object detection. In: CVPR, pp. 714–722 (2018b)Google Scholar
  31. Zhao, Q., Sheng, T., Wang, Y., et al.: CFENet: an accurate and efficient single-shot object detector for autonomous driving (2018). arXiv preprint arXiv:1806.09790
  32. Zheng, L., Fu, C., Zhao, Y.: Extend the shallow part of Single Shot MultiBox detector via convolutional neural network (2018). arXiv preprint arXiv:1801.05918

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Xi’an High Tech Research InstitutionXi’anChina

Personalised recommendations