Skip to main content
Log in

Fewer is more: efficient object detection in large aerial images

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Current mainstream object detection methods for large aerial images usually divide large images into patches and then exhaustively detect the objects of interest on all patches, no matter whether there exist objects or not. This paradigm, although effective, is inefficient because the detectors have to go through all patches, severely hindering the inference speed. This paper presents an objectness activation network (OAN) to help detectors focus on fewer patches but achieve more efficient inference and more accurate results, enabling a simple and effective solution to object detection in large images. In brief, OAN is a light fully-convolutional network for judging whether each patch contains objects or not, which can be easily integrated into many object detectors and jointly trained with them end-to-end. We extensively evaluate our OAN with five advanced detectors. Using OAN, all five detectors acquire more than 30.0% speed-up on three large-scale aerial image datasets, meanwhile with consistent accuracy improvements. On extremely large Gaofen-2 images (29200 × 27620 pixels), our OAN improves the detection speed by 70.5%. Moreover, we extend our OAN to driving-scene object detection and 4K video object detection, boosting the detection speed by 112.1% and 75.0%, respectively, without sacrificing the accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gu X, Angelov P P, Zhang C, et al. A semi-supervised deep rule-based approach for complex satellite sensor image analysis. IEEE Trans Pattern Anal Machine Intell, 2022, 44: 2281–2292

    Google Scholar 

  2. Ding J, Xue N, Xia G S, et al. Object detection in aerial images: a large-scale benchmark and challenges. IEEE Trans Pattern Anal Mach Intell, 2021, 44: 7778–7796

    Article  Google Scholar 

  3. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 7708–778

  4. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 71328–7141

  5. Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 22618–2269

  6. Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 56868–5696

  7. Xia G S, Bai X, Ding J, et al. DOTA: a large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 39748–3983

  8. Li K, Wan G, Cheng G, et al. Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogrammetry Remote Sens, 2020, 159: 296–307

    Article  Google Scholar 

  9. Ding J, Xue N, Long Y, et al. Learning RoI transformer for oriented object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 28448–2853

  10. Xu Y, Fu M, Wang Q, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans Pattern Anal Mach Intell, 2021, 43: 1452–1459

    Article  Google Scholar 

  11. Han J, Ding J, Xue N, et al. ReDet: a rotation-equivariant detector for aerial object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 27868–2795

  12. Han J, Ding J, Li J, et al. Align deep features for oriented object detection. IEEE Trans Geosci Remote Sens, 2022, 60: 1–11

    Google Scholar 

  13. Xie X, Cheng G, Wang J, et al. Oriented R-CNN for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2021. 35208–3529

  14. Yang X, Yan J, Liao W, et al. SCRDet+ +: detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans Pattern Anal Mach Intell, 2023, 45: 2384–2399

    Article  Google Scholar 

  15. Yang F, Fan H, Chu P, et al. Clustered object detection in aerial images. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 83108–8319

  16. Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1137–1149

    Article  Google Scholar 

  17. Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 3188–327

  18. Cai Z, Vasconcelos N. Cascade R-CNN: delving into high quality object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 61548–6162

  19. Law H, Deng J. CornerNet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision, 2018. 7348–750

  20. Tian Z, Shen C, Chen H, et al. FCOS: fully convolutional one-stage object detection. In: Proceedings of IEEE International Conference on Computer Vision, 2019. 96268–9635

  21. Zhang S, Chi C, Yao Y, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 97568–9765

  22. Xu C D, Zhao X R, Jin X, et al. Exploring categorical regularization for domain adaptive object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 117248–11733

  23. Zhao S, Gao C, Shao Y, et al. GTNet: generative transfer network for zero-shot object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2020. 129678–12974

  24. Feng C, Zhong Y, Gao Y, et al. TOOD: task-aligned one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2021. 34908–3499

  25. Tang Y P, Wei X S, Zhao B, et al. QBox: partial transfer learning with active querying for object detection. IEEE Trans Neural Netw Learn Syst, 2023, 34: 3058–3070

    Article  Google Scholar 

  26. Wang B, Hu T, Li B, et al. GaTector: a unified framework for gaze object prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022. 195888–19597

  27. Liu L, Ouyang W, Wang X, et al. Deep learning for generic object detection: a survey. Int J Comput Vis, 2020, 128: 261–318

    Article  Google Scholar 

  28. Cheng G, Lai P J, Gao D C, et al. Class attention network for image recognition. Sci China Inf Sci, 2023, 66: 132105

    Article  MathSciNet  Google Scholar 

  29. Cheng G, Zhou P, Han J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans Geosci Remote Sens, 2016, 54: 7405–7415

    Article  Google Scholar 

  30. Long Y, Gong Y, Xiao Z, et al. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans Geosci Remote Sens, 2017, 55: 2486–2498

    Article  Google Scholar 

  31. Cheng G, Han J, Zhou P, et al. Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection. IEEE Trans Image Process, 2019, 28: 265–278

    Article  MathSciNet  Google Scholar 

  32. Wang B, Zhao Y, Li X. Multiple instance graph learning for weakly supervised remote sensing object detection. IEEE Trans Geosci Remote Sens, 2022, 60: 1–12

    Google Scholar 

  33. Cheng G, Lang C, Wu M, et al. Feature enhancement network for object detection in optical remote sensing images. J Remote Sens, 2021, 2021: 9805389

    Article  Google Scholar 

  34. Cheng G, Yao Y, Li S, et al. Dual-aligned oriented detector. IEEE Trans Geosci Remote Sens, 2022, 60: 1–11

    Google Scholar 

  35. Yang X, Yan J. Arbitrary-oriented object detection with circular smooth label. In: Proceedings of the European Conference on Computer Vision, 2020. 6778–694

  36. Cheng G, Wang J, Li K, et al. Anchor-free oriented proposal generator for object detection. IEEE Trans Geosci Remote Sens, 2022, 60: 1–11

    Google Scholar 

  37. Yang X, Hou L, Zhou Y, et al. Dense label encoding for boundary discontinuity free rotation detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 158198–15829

  38. Ji Z, Kong Q, Wang H, et al. Small and dense commodity object detection with multi-scale receptive field attention. In: Proceedings of the ACM International Conference on Multimedia, 2019. 13498–1357

  39. Yang X, Yang X, Yang J, et al. Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence. In: Proceedings of the Advances in Neural Information Processing Systems, 2021. 183818–18394

  40. Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 7798–788

  41. Zhang S, Wen L, Bian X, et al. Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 42038–4212

  42. Cao J, Pang Y, Han J, et al. Hierarchical shot detector. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 97058–9714

  43. Gonzalez-Garcia A, Vezhnevets A, Ferrari V. An active search strategy for efficient object class detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 30228–3031

  44. LaLonde R, Zhang D, Shah M. ClusterNet: detecting small objects in large scenes by exploiting spatio-temporal information. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 40038–4012

  45. Gao M, Yu R, Li A, et al. Dynamic zoom-in network for fast object detection in large images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 6926–6935

  46. Pang J, Li C, Shi J, et al. R2-CNN: fast tiny object detection in large-scale remote sensing images. IEEE Trans Geosci Remote Sens, 2019, 57: 5512–5524

    Article  Google Scholar 

  47. Li C, Yang T, Zhu S, et al. Density map guided object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2020. 7378–746

  48. Uzkent B, Yeh C, Ermon S. Efficient object detection in large images using deep reinforcement learning. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020. 18138–1822

  49. Najibi M, Singh B, Davis L S. AutoFocus: efficient multi-scale inference. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 97458–9755

  50. Law H, Teng Y, Russakovsky O, et al. CornerNet-Lite: efficient keypoint based object detection. In: Proceedings of the British Machine Vision Conference, 2020

  51. Xie S, Girshick R, Dollar P, et al. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 59878–5995

  52. Zhang H, Wu C, Zhang Z, et al. ResNeSt: split-attention networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2022. 27368–2746

  53. Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE International Conference on Computer Vision, 2021. 100128–10022

  54. Lin T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 20258–2033

  55. Chen K, Wang J, Pang J, et al. MMDetection: open MMLab detection toolbox and benchmark. 2019. ArXiv:1906.07155

  56. Li W, Chen Y, Hu K, et al. Oriented RepPoints for aerial object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022. 18298–1838

  57. Yang J, Liu Q, Zhang K. Stacked hourglass network for robust facial landmark localisation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017. 20258–2033

  58. Zhou X, Wang D, Krahenbuhl P. Objects as points. 2019. ArXiv:1904.07850

  59. Pan X, Ren Y, Sheng K, et al. Dynamic refinement network for oriented and densely packed object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 112048–11213

  60. Chen Z, Chen K, Lin W, et al. PIoU loss: towards accurate oriented object detection in complex environments. In: Proceedings of the European Conference on Computer Vision, 2020. 1958–211

  61. Ming Q, Zhou Z, Miao L, et al. Dynamic anchor learning for arbitrary-oriented object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. 23558–2363

  62. Yang X, Yan J, Feng Z, et al. R3Det: refined single-stage detector with feature refinement for rotating object. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. 31638–3171

  63. Yang X, Yan J, Ming Q, et al. Rethinking rotated object detection with Gaussian Wasserstein distance loss. In: Proceedings of the International Conference on Machine Learning, 2021. 118308–11841

  64. Yang X, Yang J, Yan J, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 82318–8240

  65. Guo Z, Liu C, Zhang X, et al. Beyond bounding-box: convex-hull feature adaptation for oriented and densely packed object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 87928–8801

  66. Wang J, Song L, Li Z, et al. End-to-end object detection with fully convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 158498–15858

  67. Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012. 33548–3361

  68. Zhu Z, Liang D, Zhang S, et al. Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 21108–2118

  69. Zhu X, Dai J, Yuan L, et al. Towards high performance video object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 72108–7218

  70. Barekatain M, Marti M, Shih H F, et al. Okutama-action: an aerial view video dataset for concurrent human action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017. 21538–2160

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (Grant Nos. 62136007, 62376223), Natural Science Basic Research Program of Shaanxi (Grant Nos. 2021JC-16, 2023-JC-ZD-36), Fundamental Research Funds for the Central Universities, and Doctorate Foundation of Northwestern Polytechnical University (Grant No. CX2021082).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gong Cheng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, X., Cheng, G., Li, Q. et al. Fewer is more: efficient object detection in large aerial images. Sci. China Inf. Sci. 67, 112106 (2024). https://doi.org/10.1007/s11432-022-3718-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-022-3718-5

Keywords

Navigation