Abstract
Current two-stage object detectors, which mainly consist of a region proposal stage and a proposal recognition stage, may produce unreliable results for objects appearing with little information such as small and occluded objects. This is caused by poor region proposals and inaccurate proposal recognition. To address this problem, we propose a context augmentation algorithm that fully utilizes contextual information to generate high-quality region proposals and detection results. First, Region proposals are produced by two steps: 1) generate a coarse set of region proposals, some of which are reliable and some of which are ambiguous, and 2) the ambiguous region proposals are re-estimated using appearance and geometry information with respect to the reliable region proposals from step 1). Second, similar types of pair-wise relations between region proposals are used to produce global feature information associated with the region proposals in order to enhance recognition results. In practice, our method effectively improves the quality of region proposals as well as recognition results. Empirical studies show that the proposed context augmentation yields substantial and consistent improvements over baseline Faster R-CNN. Moreover, there is around 1.3% mAP improvement over Mask R-CNN on COCO dataset.
Similar content being viewed by others
References
Bar M (2004) Visual objects in context. Nat Rev Neurosci 5(8):617
Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) net: Inside-outside Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, pp 354–370
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe B., Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016 - 14th European Conference. Proceedings, Part IV, volume 9908 of Lecture Notes in Computer Science, pp 354–370. Springer, 2016. Amsterdam, The Netherlands
Cai Z, Vasconcelos N (2018) Cascade r-CNN: delving into high quality object detection. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. IEEE Computer Society, Salt Lake City, pp 6154–6162
Carbonetto P, De Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: European conference on computer vision. Springer, pp 350–362
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille A (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen X, Gupta A (2017) Spatial memory for context reasoning in object detection. arXiv:1704.04224
Chen X, Li L. -J., Fei-Fei L, Gupta A (2018) Iterative visual reasoning beyond convolutions. arXiv:1803.11189
Chen Z, Huang S, Tao D (2018) Context refinement for object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018 - 15th European Conference , Munich
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. CoRR, arXiv:1703.06211.1.2.3
Divvala SK, Hoiem D, Hays J, Efros AA, Hebert M (2009) An empirical study of context in object detection. In: 2009. CVPR 2009. IEEE conference on Computer vision and pattern recognition. IEEE, pp 1271–1278
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659
Galleguillos C, Belongie S (2010) Context based object categorization: A critical survey. Comput Vis Image Underst 114(6):712–722
Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: 2008. CVPR 2008. IEEE conference on Computer vision and pattern recognition. IEEE, pp 1–8
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1134–1142
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware CNN model. In: 2015 IEEE International conference on computer vision, ICCV 2015. IEEE computer society, Santiago,, pp 1134–1142
Gu J, Hu H, Wang L, Wei Y, Dai J (2018) Learning region features for object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018 - 15th European Conference. Proceedings, Part XII, volume 11216 of Lecture Notes in Computer Science. Springer, pp 392– 406, Munich
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: 2017 IEEE international conference on Computer vision (ICCV). IEEE, pp 2980–2988
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. IEEE Computer Society, Salt Lake City, pp 3588–3597
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853
Leng J, Liu Y An enhanced ssd with feature fusion and visual reasoning for object detection. Neural Computing and Applications, pp 1–10
Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Trans Multimed 19(5):944–954
Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Trans Multimed 19(5):944–954
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu Y, Wang R, Shan S, Chen X (2018) net: Structure inference Object detection using scene-level context and instance-level relationships. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. IEEE Computer Society, salt lake city, pp 6985–6994
Mottaghi R, Chen X, Liu X, Cho N-G, Lee S-W, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 891–898
Oliva A, Torralba A (2007) The role of context in object recognition. Trends Cogn Sci 11 (12):520–527
Ouyang W, Wang K, Zhu X, Wang X (2017) Learning chained deep features and classifiers for cascade in object detection. arXiv:1702.07054
Palmer TE (1975) The effects of contextual scenes on the identification of objects. Memory Cogn 3:519–526
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) r-cnn: Libra Towards balanced learning for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 821–830
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell (6):1137–1149
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Yu R, Chen X, Morariu VI, Davis LS (2016) The role of context selection in object detection. arXiv:1609.02948
Zagoruyko S, Lerer A, Lin T-Y, Pinheiro PO, Gross S, Chintala S, Dollár P (2016) A multipath network for object detection. arXiv:1604.02135
Zeng X, Ouyang W, Yan J, Li H, Xiao T, Wang K, Liu Y, Zhou Y, Yang B, Wang Z et al (2018) Crafting gbd-net for object detection. IEEE Trans Pattern Anal Mach Intell 40(9):2109–2123
Zeng X, Ouyang W, Yang B, Yan J, Wang X (2016) Gated bi-directional CNN for object detection. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016 - 14th European Conference. Proceedings, Part VII, volume 9911 of Lecture Notes in Computer Science. Springer, pp 354–369, The Netherlands
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4203–4212
Acknowledgments
This project was partially supported by Grants from Natural Science Foundation of China 71671178. It was also supported by the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Leng, J., Liu, Y. Context augmentation for object detection. Appl Intell 52, 2621–2633 (2022). https://doi.org/10.1007/s10489-020-02037-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02037-z