Hierarchical Context Embedding for Region-Based Object Detection

Chen, Zhao-Min; Jin, Xin; Zhao, Borui; Wei, Xiu-Shen; Guo, Yanwen

doi:10.1007/978-3-030-58589-1_38

Zhao-Min Chen^12,13,
Xin Jin¹³,
Borui Zhao¹³,
Xiu-Shen Wei¹³ &
…
Yanwen Guo¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12366))

Included in the following conference series:

European Conference on Computer Vision

3762 Accesses
21 Citations

Abstract

State-of-the-art two-stage object detectors apply a classifier to a sparse set of object proposals, relying on region-wise features extracted by RoIPool or RoIAlign as inputs. The region-wise features, in spite of aligning well with the proposal locations, may still lack the crucial context information which is necessary for filtering out noisy background detections, as well as recognizing objects possessing no distinctive appearances. To address this issue, we present a simple but effective Hierarchical Context Embedding (HCE) framework, which can be applied as a plug-and-play component, to facilitate the classification ability of a series of region-based detectors by mining contextual cues. Specifically, to advance the recognition of context-dependent object categories, we propose an image-level categorical embedding module which leverages the holistic image-level context to learn object-level concepts. Then, novel RoI features are generated by exploiting hierarchically embedded context information beneath both whole images and interested regions, which are also complementary to conventional RoI features. Moreover, to make full use of our hierarchical contextual RoI features, we propose the early-and-late fusion strategies (i.e., feature fusion and confidence fusion), which can be combined to boost the classification accuracy of region-based detectors. Comprehensive experiments demonstrate that our HCE framework is flexible and generalizable, leading to significant and consistent improvements upon various region-based detectors, including FPN, Cascade R-CNN and Mask R-CNN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Scene Semantic Guidance for Object Detection

Feature refinement with multi-level context for object detection

Article 12 May 2023

Context augmentation for object detection

Article 17 June 2021

References

Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: CVPR, pp. 1081–1089 (2015)
Google Scholar
Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: CVPR, pp. 2846–2854 (2016)
Google Scholar
Byeon, W., Wang, Q., Srivastava, R.K., Koumoutsakos, P.: Contextvp: fully context-aware video prediction. In: ECCV, pp. 753–769 (2018)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018)
Google Scholar
Chen, K., et al.: MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Chen, Z., Huang, S., Tao, D.: Context refinement for object detection. In: ECCV, pp. 71–86 (2018)
Google Scholar
Cinbis, R.G., Verbeek, J., Schmid, C.: Weakly supervised object localization with multi-fold multiple instance learning. TPAMI 39(1), 189–203 (2016)
Article Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS, pp. 379–387 (2016)
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: ICCV, pp. 764–773 (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Google Scholar
Ebersbach, M., Herms, R., Eibl, M.: Fusion methods for ICD10 code classification of death certificates in multilingual corpora. In: CLEF (2017)
Google Scholar
Galleguillos, C., Belongie, S.: Context based object categorization: a critical survey. CVIU 114(6), 712–722 (2010)
Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Google Scholar
Gunes, H., Piccardi, M.: Affect recognition from face and body: early fusion vs. late fusion. In: SMC, vol. 4, pp. 3437–3443. IEEE (2005)
Google Scholar
Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: CVPR, pp. 237–244 (2009)
Google Scholar
He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In: CVPR, pp. 7519–7528 (2019)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)
Google Scholar
Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: ECCV, pp. 784–799 (2018)
Google Scholar
Kantorov, V., Oquab, M., Cho, M., Laptev, I.: ContextLocNet: context-aware deep network models for weakly supervised localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 350–365. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_22
Chapter Google Scholar
Li, X., Wu, J., Lin, Z., Liu, H., Zha, H.: Recurrent squeeze-and-excitation context aggregation net for single image deraining. In: ECCV, pp. 254–269 (2018)
Google Scholar
Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In: ICCV, pp. 6054–6063 (2019)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 2117–2125 (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, L., et al.: Deep learning for generic object detection: a survey. IJCV 128(2), 1–30 (2019). https://doi.org/10.1007/s11263-019-01247-4
Article Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Luo, W.: Understanding the effective receptive field in deep convolutional neural networks. In: NIPS (2016)
Google Scholar
Qu, L., Tian, J., He, S., Tang, Y., Lau, R.W.: Deshadownet: a multi-context embedding deep network for shadow removal. In: CVPR, pp. 4067–4075 (2017)
Google Scholar
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: ICCV, pp. 1–8 (2007)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 pp. 1–6 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Tan, Z., Nie, X., Qian, Q., Li, N., Li, H.: Learning to rank proposals for object detection. In: ICCV, pp. 8273–8281 (2019)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: ICCV, pp. 9627–9636 (2019)
Google Scholar
Uijlings, J., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. IJCV 104(2), 154–171 (2013). https://doi.org/10.1007/252Fs11263-013-0620-5
Article Google Scholar
Wang, Z., Chen, T., Li, G., Xu, R., Lin, L.: Multi-label image recognition by recurrently discovering attentional regions. In: ICCV, pp. 464–472 (2017)
Google Scholar
Woo, S., Park, J., Lee, J.Y., So Kweon, I.: CBAM: convolutional block attention module. In: ECCV, pp. 3–19 (2018)
Google Scholar
Xu, C.D., Zhao, X.R., Jin, X., Wei, X.S.: Exploring categorical regularization for domain adaptive object detection. In: CVPR, pp. 11724–11733 (2020)
Google Scholar
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: CVPR, pp. 9308–9316 (2019)
Google Scholar

Download references

Acknowledgements

Z.-M. Chen’s contribution was made when he was an intern in Megvii Research Nanjing. This research was supported by the National Key Research and Development Program of China under Grant 2017YFA0700800, the National Natural Science Foundation of China under Grants 61772257 and the Fundamental Research Funds for the Central Universities 020914380080.

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Zhao-Min Chen & Yanwen Guo
Megvii Research Nanjing, Megvii Technology, Beijing, China
Zhao-Min Chen, Xin Jin, Borui Zhao & Xiu-Shen Wei

Authors

Zhao-Min Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Borui Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiu-Shen Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yanwen Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xiu-Shen Wei or Yanwen Guo .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1070 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, ZM., Jin, X., Zhao, B., Wei, XS., Guo, Y. (2020). Hierarchical Context Embedding for Region-Based Object Detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-58589-1_38
Published: 12 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58588-4
Online ISBN: 978-3-030-58589-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hierarchical Context Embedding for Region-Based Object Detection

Abstract

Access this chapter

Similar content being viewed by others

Scene Semantic Guidance for Object Detection

Feature refinement with multi-level context for object detection

Context augmentation for object detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1070 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Hierarchical Context Embedding for Region-Based Object Detection

Abstract

Access this chapter

Similar content being viewed by others

Scene Semantic Guidance for Object Detection

Feature refinement with multi-level context for object detection

Context augmentation for object detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1070 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation