Skip to main content

Context augmentation for object detection

Abstract

Current two-stage object detectors, which mainly consist of a region proposal stage and a proposal recognition stage, may produce unreliable results for objects appearing with little information such as small and occluded objects. This is caused by poor region proposals and inaccurate proposal recognition. To address this problem, we propose a context augmentation algorithm that fully utilizes contextual information to generate high-quality region proposals and detection results. First, Region proposals are produced by two steps: 1) generate a coarse set of region proposals, some of which are reliable and some of which are ambiguous, and 2) the ambiguous region proposals are re-estimated using appearance and geometry information with respect to the reliable region proposals from step 1). Second, similar types of pair-wise relations between region proposals are used to produce global feature information associated with the region proposals in order to enhance recognition results. In practice, our method effectively improves the quality of region proposals as well as recognition results. Empirical studies show that the proposed context augmentation yields substantial and consistent improvements over baseline Faster R-CNN. Moreover, there is around 1.3% mAP improvement over Mask R-CNN on COCO dataset.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

References

  1. Bar M (2004) Visual objects in context. Nat Rev Neurosci 5(8):617

    Article  Google Scholar 

  2. Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) net: Inside-outside Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

  3. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, pp 354–370

  4. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe B., Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016 - 14th European Conference. Proceedings, Part IV, volume 9908 of Lecture Notes in Computer Science, pp 354–370. Springer, 2016. Amsterdam, The Netherlands

  5. Cai Z, Vasconcelos N (2018) Cascade r-CNN: delving into high quality object detection. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. IEEE Computer Society, Salt Lake City, pp 6154–6162

  6. Carbonetto P, De Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: European conference on computer vision. Springer, pp 350–362

  7. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille A (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  8. Chen X, Gupta A (2017) Spatial memory for context reasoning in object detection. arXiv:1704.04224

  9. Chen X, Li L. -J., Fei-Fei L, Gupta A (2018) Iterative visual reasoning beyond convolutions. arXiv:1803.11189

  10. Chen Z, Huang S, Tao D (2018) Context refinement for object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018 - 15th European Conference , Munich

  11. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  12. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. CoRR, arXiv:1703.06211.1.2.3

  13. Divvala SK, Hoiem D, Hays J, Efros AA, Hebert M (2009) An empirical study of context in object detection. In: 2009. CVPR 2009. IEEE conference on Computer vision and pattern recognition. IEEE, pp 1271–1278

  14. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  15. Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659

  16. Galleguillos C, Belongie S (2010) Context based object categorization: A critical survey. Comput Vis Image Underst 114(6):712–722

    Article  Google Scholar 

  17. Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using co-occurrence, location and appearance. In: 2008. CVPR 2008. IEEE conference on Computer vision and pattern recognition. IEEE, pp 1–8

  18. Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1134–1142

  19. Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware CNN model. In: 2015 IEEE International conference on computer vision, ICCV 2015. IEEE computer society, Santiago,, pp 1134–1142

  20. Gu J, Hu H, Wang L, Wei Y, Dai J (2018) Learning region features for object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018 - 15th European Conference. Proceedings, Part XII, volume 11216 of Lecture Notes in Computer Science. Springer, pp 392– 406, Munich

  21. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: 2017 IEEE international conference on Computer vision (ICCV). IEEE, pp 2980–2988

  22. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645

  23. Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. IEEE Computer Society, Salt Lake City, pp 3588–3597

  24. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853

  25. Leng J, Liu Y An enhanced ssd with feature fusion and visual reasoning for object detection. Neural Computing and Applications, pp 1–10

  26. Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Trans Multimed 19(5):944–954

    Article  Google Scholar 

  27. Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Trans Multimed 19(5):944–954

    Article  Google Scholar 

  28. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  29. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755

  30. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  31. Liu Y, Wang R, Shan S, Chen X (2018) net: Structure inference Object detection using scene-level context and instance-level relationships. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. IEEE Computer Society, salt lake city, pp 6985–6994

  32. Mottaghi R, Chen X, Liu X, Cho N-G, Lee S-W, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 891–898

  33. Oliva A, Torralba A (2007) The role of context in object recognition. Trends Cogn Sci 11 (12):520–527

    Article  Google Scholar 

  34. Ouyang W, Wang K, Zhu X, Wang X (2017) Learning chained deep features and classifiers for cascade in object detection. arXiv:1702.07054

  35. Palmer TE (1975) The effects of contextual scenes on the identification of objects. Memory Cogn 3:519–526

    Article  Google Scholar 

  36. Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) r-cnn: Libra Towards balanced learning for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 821–830

  37. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

  38. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell (6):1137–1149

  39. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  40. Yu R, Chen X, Morariu VI, Davis LS (2016) The role of context selection in object detection. arXiv:1609.02948

  41. Zagoruyko S, Lerer A, Lin T-Y, Pinheiro PO, Gross S, Chintala S, Dollár P (2016) A multipath network for object detection. arXiv:1604.02135

  42. Zeng X, Ouyang W, Yan J, Li H, Xiao T, Wang K, Liu Y, Zhou Y, Yang B, Wang Z et al (2018) Crafting gbd-net for object detection. IEEE Trans Pattern Anal Mach Intell 40(9):2109–2123

    Article  Google Scholar 

  43. Zeng X, Ouyang W, Yang B, Yan J, Wang X (2016) Gated bi-directional CNN for object detection. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016 - 14th European Conference. Proceedings, Part VII, volume 9911 of Lecture Notes in Computer Science. Springer, pp 354–369, The Netherlands

  44. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4203–4212

Download references

Acknowledgments

This project was partially supported by Grants from Natural Science Foundation of China 71671178. It was also supported by the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaxu Leng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Leng, J., Liu, Y. Context augmentation for object detection. Appl Intell 52, 2621–2633 (2022). https://doi.org/10.1007/s10489-020-02037-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02037-z

Keywords

  • Object detection
  • Region proposals
  • Contextual information