Global context aware RCNN for object detection

Zhang, Wenchao; Fu, Chong; Xie, Haoyu; Zhu, Mai; Tie, Ming; Chen, Junxin

doi:10.1007/s00521-021-05867-1

Global context aware RCNN for object detection

Original Article
Published: 10 March 2021

Volume 33, pages 11627–11639, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Wenchao Zhang¹,
Chong Fu ORCID: orcid.org/0000-0002-4549-744X^1,2,3,
Haoyu Xie¹,
Mai Zhu¹,
Ming Tie⁴ &
…
Junxin Chen⁵

873 Accesses
16 Citations
1 Altmetric
Explore all metrics

Abstract

RoIPool/RoIAlign is an indispensable process for the typical two-stage object detection algorithm, it is used to rescale the object proposal cropped from the feature pyramid to generate a fixed size feature map. However, these cropped feature maps of local receptive fields will heavily lose global context information. To tackle this problem, we propose a novel end-to-end trainable framework, called global context aware (GCA) RCNN, aiming at assisting the neural network in strengthening the spatial correlation between the background and the foreground by fusing global context information. The core component of our GCA framework is a context aware mechanism, in which both global feature pyramid and attention strategies are used for feature extraction and feature refinement, respectively. Specifically, we leverage the dense connection to improve the information flow of the global context at different stages in the top-down process of FPN, and further use the attention mechanism to refine the global context at each level in the feature pyramid. In the end, we also present a lightweight version of our method, which only slightly increases model complexity and computational burden. Experimental results on COCO benchmark dataset demonstrate the significant advantages of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale global context feature pyramid network for object detector

Article 03 September 2021

Feature refinement with multi-level context for object detection

Article 12 May 2023

Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection

Article Open access 24 June 2023

References

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR09
Zhang H, Sun M, Li Q, Liu L, Liu M, Ji Y (2021) An empirical study of multi-scale object detection in high resolution UAV images. Neurocomputing 421:173–182
Article Google Scholar
Zhang H, Li D, Ji Y, Zhou H, Wu W, Liu K (2019) Toward new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Ind Inform 16(12):7722–7731
Article Google Scholar
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:180402767
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 850–859
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:190407850
Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision, pp 9627–9636
Wu Y, Chen Y, Yuan L, Liu Z, Wang L, Li H, Fu Y (2020) Rethinking classification and localization for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10186–10195
Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2965–2974
Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Chen K, Pang J, Wang J, Xiong Y, Li X, Sun S, Feng W, Liu Z, Shi J, Ouyang W, et al. (2019) Hybrid task cascade for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4974–4983
Qiao S, Chen LC, Yuille A (2020) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334
Bello I, Zoph B, Vaswani A, Shlens J, Le QV (2019) Attention augmented convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 3286–3295
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Woo S, Park J, Lee JY, So Kweon I (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3588–3597
Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the European conference on computer vision (ECCV), pp 797–813
Wu J, Kuang Z, Wang L, Zhang W, Wu G (2020) Context-aware RCNN: a baseline for action detection in videos. arXiv preprint arXiv:200709861
Lin X, Ma L, Liu W, Chang SF (2020) Context-gated convolution. In: Proceedings of the European conference on computer vision. Springer, Cham, pp 701–718
Si J, Zhang H, Li CG, Kuen J, Kong X, Kot AC, Wang G (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5363–5372
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: ICML
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al. (2019) Pytorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems, pp 8024–8035
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Xu H, Lv X, Wang X, Ren Z, Bodla N, Chellappa R (2018) Deep regionlets for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 798–814
He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: 2017 IEEE international conference on computer vision (ICCV)
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 784–799
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-NMS—improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569
Tan Z, Nie X, Qian Q, Li N, Li H (2019) Learning to rank proposals for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 8273–8281
Tychsen-Smith L, Petersson L (2018) Improving object localization with fitness NMS and bounded IoU loss. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6877–6885
Chen K, Wang J, Pang J, Cao Y, Xiong Y, Li X, Sun S, Feng W, Liu Z, Xu J, Zhang Z, Cheng D, Zhu C, Cheng T, Zhao Q, Li B, Lu X, Zhu R, Wu Y, Dai J, Wang J, Shi J, Ouyang W, Loy CC, Lin D (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:190607155

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61802055 and 61773068) and the Fundamental Research Funds for the Central Universities (No. N2024005-1).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Northeastern University, Shenyang, 110819, China
Wenchao Zhang, Chong Fu, Haoyu Xie & Mai Zhu
Engineering Research Center of Security Technology of Complex Network System, Ministry of Education, Shenyang, China
Chong Fu
Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang, 110819, China
Chong Fu
Science and Technology on Space Physics Laboratory, Beijing, 100076, China
Ming Tie
College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110819, China
Junxin Chen

Authors

Wenchao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chong Fu
View author publications
You can also search for this author in PubMed Google Scholar
Haoyu Xie
View author publications
You can also search for this author in PubMed Google Scholar
Mai Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ming Tie
View author publications
You can also search for this author in PubMed Google Scholar
Junxin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chong Fu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, W., Fu, C., Xie, H. et al. Global context aware RCNN for object detection. Neural Comput & Applic 33, 11627–11639 (2021). https://doi.org/10.1007/s00521-021-05867-1

Download citation

Received: 01 December 2020
Accepted: 19 February 2021
Published: 10 March 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s00521-021-05867-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Global context aware RCNN for object detection

Abstract

Access this article

Similar content being viewed by others

Multi-scale global context feature pyramid network for object detector

Feature refinement with multi-level context for object detection

Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Global context aware RCNN for object detection

Abstract

Access this article

Similar content being viewed by others

Multi-scale global context feature pyramid network for object detector

Feature refinement with multi-level context for object detection

Transformer-Based Context Condensation for Boosting Feature Pyramids in Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation