Abstract
Current mainstream object detection methods for large aerial images usually divide large images into patches and then exhaustively detect the objects of interest on all patches, no matter whether there exist objects or not. This paradigm, although effective, is inefficient because the detectors have to go through all patches, severely hindering the inference speed. This paper presents an objectness activation network (OAN) to help detectors focus on fewer patches but achieve more efficient inference and more accurate results, enabling a simple and effective solution to object detection in large images. In brief, OAN is a light fully-convolutional network for judging whether each patch contains objects or not, which can be easily integrated into many object detectors and jointly trained with them end-to-end. We extensively evaluate our OAN with five advanced detectors. Using OAN, all five detectors acquire more than 30.0% speed-up on three large-scale aerial image datasets, meanwhile with consistent accuracy improvements. On extremely large Gaofen-2 images (29200 × 27620 pixels), our OAN improves the detection speed by 70.5%. Moreover, we extend our OAN to driving-scene object detection and 4K video object detection, boosting the detection speed by 112.1% and 75.0%, respectively, without sacrificing the accuracy.
Similar content being viewed by others
References
Gu X, Angelov P P, Zhang C, et al. A semi-supervised deep rule-based approach for complex satellite sensor image analysis. IEEE Trans Pattern Anal Machine Intell, 2022, 44: 2281–2292
Ding J, Xue N, Xia G S, et al. Object detection in aerial images: a large-scale benchmark and challenges. IEEE Trans Pattern Anal Mach Intell, 2021, 44: 7778–7796
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 7708–778
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 71328–7141
Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 22618–2269
Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 56868–5696
Xia G S, Bai X, Ding J, et al. DOTA: a large-scale dataset for object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 39748–3983
Li K, Wan G, Cheng G, et al. Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogrammetry Remote Sens, 2020, 159: 296–307
Ding J, Xue N, Long Y, et al. Learning RoI transformer for oriented object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 28448–2853
Xu Y, Fu M, Wang Q, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans Pattern Anal Mach Intell, 2021, 43: 1452–1459
Han J, Ding J, Xue N, et al. ReDet: a rotation-equivariant detector for aerial object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 27868–2795
Han J, Ding J, Li J, et al. Align deep features for oriented object detection. IEEE Trans Geosci Remote Sens, 2022, 60: 1–11
Xie X, Cheng G, Wang J, et al. Oriented R-CNN for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2021. 35208–3529
Yang X, Yan J, Liao W, et al. SCRDet+ +: detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. IEEE Trans Pattern Anal Mach Intell, 2023, 45: 2384–2399
Yang F, Fan H, Chu P, et al. Clustered object detection in aerial images. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 83108–8319
Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 1137–1149
Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 3188–327
Cai Z, Vasconcelos N. Cascade R-CNN: delving into high quality object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 61548–6162
Law H, Deng J. CornerNet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision, 2018. 7348–750
Tian Z, Shen C, Chen H, et al. FCOS: fully convolutional one-stage object detection. In: Proceedings of IEEE International Conference on Computer Vision, 2019. 96268–9635
Zhang S, Chi C, Yao Y, et al. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 97568–9765
Xu C D, Zhao X R, Jin X, et al. Exploring categorical regularization for domain adaptive object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 117248–11733
Zhao S, Gao C, Shao Y, et al. GTNet: generative transfer network for zero-shot object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2020. 129678–12974
Feng C, Zhong Y, Gao Y, et al. TOOD: task-aligned one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2021. 34908–3499
Tang Y P, Wei X S, Zhao B, et al. QBox: partial transfer learning with active querying for object detection. IEEE Trans Neural Netw Learn Syst, 2023, 34: 3058–3070
Wang B, Hu T, Li B, et al. GaTector: a unified framework for gaze object prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022. 195888–19597
Liu L, Ouyang W, Wang X, et al. Deep learning for generic object detection: a survey. Int J Comput Vis, 2020, 128: 261–318
Cheng G, Lai P J, Gao D C, et al. Class attention network for image recognition. Sci China Inf Sci, 2023, 66: 132105
Cheng G, Zhou P, Han J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans Geosci Remote Sens, 2016, 54: 7405–7415
Long Y, Gong Y, Xiao Z, et al. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans Geosci Remote Sens, 2017, 55: 2486–2498
Cheng G, Han J, Zhou P, et al. Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection. IEEE Trans Image Process, 2019, 28: 265–278
Wang B, Zhao Y, Li X. Multiple instance graph learning for weakly supervised remote sensing object detection. IEEE Trans Geosci Remote Sens, 2022, 60: 1–12
Cheng G, Lang C, Wu M, et al. Feature enhancement network for object detection in optical remote sensing images. J Remote Sens, 2021, 2021: 9805389
Cheng G, Yao Y, Li S, et al. Dual-aligned oriented detector. IEEE Trans Geosci Remote Sens, 2022, 60: 1–11
Yang X, Yan J. Arbitrary-oriented object detection with circular smooth label. In: Proceedings of the European Conference on Computer Vision, 2020. 6778–694
Cheng G, Wang J, Li K, et al. Anchor-free oriented proposal generator for object detection. IEEE Trans Geosci Remote Sens, 2022, 60: 1–11
Yang X, Hou L, Zhou Y, et al. Dense label encoding for boundary discontinuity free rotation detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 158198–15829
Ji Z, Kong Q, Wang H, et al. Small and dense commodity object detection with multi-scale receptive field attention. In: Proceedings of the ACM International Conference on Multimedia, 2019. 13498–1357
Yang X, Yang X, Yang J, et al. Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence. In: Proceedings of the Advances in Neural Information Processing Systems, 2021. 183818–18394
Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 7798–788
Zhang S, Wen L, Bian X, et al. Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 42038–4212
Cao J, Pang Y, Han J, et al. Hierarchical shot detector. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 97058–9714
Gonzalez-Garcia A, Vezhnevets A, Ferrari V. An active search strategy for efficient object class detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 30228–3031
LaLonde R, Zhang D, Shah M. ClusterNet: detecting small objects in large scenes by exploiting spatio-temporal information. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 40038–4012
Gao M, Yu R, Li A, et al. Dynamic zoom-in network for fast object detection in large images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 6926–6935
Pang J, Li C, Shi J, et al. R2-CNN: fast tiny object detection in large-scale remote sensing images. IEEE Trans Geosci Remote Sens, 2019, 57: 5512–5524
Li C, Yang T, Zhu S, et al. Density map guided object detection in aerial images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2020. 7378–746
Uzkent B, Yeh C, Ermon S. Efficient object detection in large images using deep reinforcement learning. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020. 18138–1822
Najibi M, Singh B, Davis L S. AutoFocus: efficient multi-scale inference. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 97458–9755
Law H, Teng Y, Russakovsky O, et al. CornerNet-Lite: efficient keypoint based object detection. In: Proceedings of the British Machine Vision Conference, 2020
Xie S, Girshick R, Dollar P, et al. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 59878–5995
Zhang H, Wu C, Zhang Z, et al. ResNeSt: split-attention networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2022. 27368–2746
Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE International Conference on Computer Vision, 2021. 100128–10022
Lin T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 20258–2033
Chen K, Wang J, Pang J, et al. MMDetection: open MMLab detection toolbox and benchmark. 2019. ArXiv:1906.07155
Li W, Chen Y, Hu K, et al. Oriented RepPoints for aerial object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022. 18298–1838
Yang J, Liu Q, Zhang K. Stacked hourglass network for robust facial landmark localisation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017. 20258–2033
Zhou X, Wang D, Krahenbuhl P. Objects as points. 2019. ArXiv:1904.07850
Pan X, Ren Y, Sheng K, et al. Dynamic refinement network for oriented and densely packed object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 112048–11213
Chen Z, Chen K, Lin W, et al. PIoU loss: towards accurate oriented object detection in complex environments. In: Proceedings of the European Conference on Computer Vision, 2020. 1958–211
Ming Q, Zhou Z, Miao L, et al. Dynamic anchor learning for arbitrary-oriented object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. 23558–2363
Yang X, Yan J, Feng Z, et al. R3Det: refined single-stage detector with feature refinement for rotating object. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. 31638–3171
Yang X, Yan J, Ming Q, et al. Rethinking rotated object detection with Gaussian Wasserstein distance loss. In: Proceedings of the International Conference on Machine Learning, 2021. 118308–11841
Yang X, Yang J, Yan J, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 82318–8240
Guo Z, Liu C, Zhang X, et al. Beyond bounding-box: convex-hull feature adaptation for oriented and densely packed object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 87928–8801
Wang J, Song L, Li Z, et al. End-to-end object detection with fully convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 158498–15858
Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2012. 33548–3361
Zhu Z, Liang D, Zhang S, et al. Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 21108–2118
Zhu X, Dai J, Yuan L, et al. Towards high performance video object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 72108–7218
Barekatain M, Marti M, Shih H F, et al. Okutama-action: an aerial view video dataset for concurrent human action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017. 21538–2160
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (Grant Nos. 62136007, 62376223), Natural Science Basic Research Program of Shaanxi (Grant Nos. 2021JC-16, 2023-JC-ZD-36), Fundamental Research Funds for the Central Universities, and Doctorate Foundation of Northwestern Polytechnical University (Grant No. CX2021082).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xie, X., Cheng, G., Li, Q. et al. Fewer is more: efficient object detection in large aerial images. Sci. China Inf. Sci. 67, 112106 (2024). https://doi.org/10.1007/s11432-022-3718-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-022-3718-5