End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution

Liao, Mingxiang; Wan, Fang; Yao, Yuan; Han, Zhenjun; Zou, Jialing; Wang, Yuze; Feng, Bailan; Yuan, Peng; Ye, Qixiang

doi:10.1007/978-3-031-20077-9_13

Mingxiang Liao¹²,
Fang Wan¹²,
Yuan Yao¹²,
Zhenjun Han¹²,
Jialing Zou¹²,
Yuze Wang¹³,
Bailan Feng¹³,
Peng Yuan¹³ &
…
Qixiang Ye¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13669))

Included in the following conference series:

European Conference on Computer Vision

3148 Accesses
4 Citations

Abstract

Conventional methods for weakly supervised object detection (WSOD) typically enumerate dense proposals and select the discriminative proposals as objects. However, these two-stage “enumerate-and-select” methods suffer object feature ambiguity brought by dense proposals and low detection efficiency caused by the proposal enumeration procedure. In this study, we propose a sparse proposal evolution (SPE) approach, which advances WSOD from the two-stage pipeline with dense proposals to an end-to-end framework with sparse proposals. SPE is built upon a visual transformer equipped with a seed proposal generation (SPG) branch and a sparse proposal refinement (SPR) branch. SPG generates high-quality seed proposals by taking advantage of the cascaded self-attention mechanism of the visual transformer, and SPR trains the detector to predict sparse proposals which are supervised by the seed proposals in a one-to-one matching fashion. SPG and SPR are iteratively performed so that seed proposals update to accurate supervision signals and sparse proposals evolve to precise object regions. Experiments on VOC and COCO object detection datasets show that SPE outperforms the state-of-the-art end-to-end methods by 7.0% mAP and 8.1% AP50. It is an order of magnitude faster than the two-stage methods, setting the first solid baseline for end-to-end WSOD with sparse proposals. The code is available at https://github.com/MingXiangL/SPE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alex, K., Ilya, S., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NeurIPS, pp. 1097–1115 (2012)
Google Scholar
Arbeláez, P.A., Pont-Tuset, J., Barron, J.T., Marqués, F., Malik, J.: Multiscale combinatorial grouping. In: IEEE CVPR, pp. 328–335 (2014)
Google Scholar
Arun, A., Jawahar, C.V., Kumar, M.P.: Dissimilarity coefficient based weakly supervised object detection. In: IEEE CVPR, pp. 9432–9441 (2019)
Google Scholar
Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with posterior regularization. In: BMVC, pp. 1997–2005 (2014)
Google Scholar
Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: IEEE CVPR, pp. 1081–1089 (2015)
Google Scholar
Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: IEEE CVPR. pp. 2846–2854 (2016)
Google Scholar
Cao, T., Du, L., Zhang, X., Chen, S., Zhang, Y., Wang, Y.: Cat: Weakly supervised object detection with category transfer (2021)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE TPAMI 34(7), 1312–1328 (2012)
Google Scholar
Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE TPAMI 34(7), 1312–1328 (2012)
Google Scholar
Cheng, G., Yang, J., Gao, D., Guo, L., Han, J.: High-quality proposals for weakly supervised object detection. IEEE TIP 29, 5794–5804 (2020)
Google Scholar
Cheng, M., Zhang, Z., Lin, W., Torr, P.H.S.: BING: binarized normed gradients for objectness estimation at 300fps. In: IEEE CVPR, pp. 3286–3293 (2014)
Google Scholar
Chong, W., Kaiqi, H., Weiqiang, R., Junge, Z., Steve, M.: Large-scale weakly supervised object localization via latent category learning. IEEE TIP 24(4), 1371–1385 (2015)
Google Scholar
Wang, C., Ren, W., Huang, K., Tan, T.: Weakly supervised object localization with latent category learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 431–445. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_28
Diba, A., Sharma, V., Pazandeh, A., Pirsiavash, H., Van Gool, L.: Weakly supervised cascaded convolutional networks. In: IEEE CVPR, pp. 5131–5139 (2017)
Google Scholar
Diba, A., Sharma, V., Stiefelhagen, R., Van Gool, L.: Object discovery by generative adversarial & ranking networks. arXiv preprint arXiv:1711.08174 (2017)
Dong, B., Huang, Z., Guo, Y., Wang, Q., Niu, Z., Zuo, W.: Boosting weakly supervised object detection via learning bounding box adjusters. In: IEEE ICCV (2021)
Google Scholar
Dong, L., Bin, H.J., Yali, L., Shengjin, W., Hsuan, Y.M.: Weakly supervised object localization with progressive domain adaptation. In: IEEE CVPR, pp. 3512–3520 (2016)
Google Scholar
Fang, W., Chang, L., Wei, K., Xiangyang, J., Jianbin, J., Qixiang, Y.: CMIL: continuation multiple instance learning for weakly supervised object detection. In: IEEE CVPR (2019)
Google Scholar
Gao, W., et al.: TS-CAM: token semantic coupled attention map for weakly supervised object localization. CoRR abs/2103.14862 (2021)
Google Scholar
Gao, Y., et al.: C-MIDN: coupled multiple instance detection network with segmentation guidance for weakly supervised object detection. In: IEEE ICCV (2019)
Google Scholar
Gudi, A., van Rosmalen, N., Loog, M., van Gemert, J.C.: Object-extent pooling for weakly supervised single-shot localization. In: BMVC (2017)
Google Scholar
Huang, Z., Zou, Y., Kumar, B.V.K.V., Huang, D.: Comprehensive attention self-distillation for weakly-supervised object detection. In: NeurIPS (2020)
Google Scholar
Kantorov, V., et al.: Deep self-taught learning for weakly supervised object localization. In: IEEE CVPR, pp. 4294–4302 (2017)
Google Scholar
ContextLocNet: context-aware deep network models for weakly supervised localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 350–365. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_22
Kosugi, S., Yamasaki, T., Aizawa, K.: Object-aware instance labeling for weakly supervised object detection. In: IEEE ICCV (2019)
Google Scholar
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26
Li, X., Kan, M., Shan, S., Chen, X.: Weakly supervised object detection with segmentation collaboration. In: IEEE ICCV (2019)
Google Scholar
Mark, E., Luc, V.G., KI, W.C., John, W., Andrew, Z.: The pascal visual object classes (VOC) challenge. IJCV. 88(2), 303–338 (2010)
Google Scholar
Meng, D., et al.: Conditional DETR for fast training convergence. In: IEEE ICCV, pp. 3651–3660, October 2021
Google Scholar
Oh, S.H., Jae, L.Y., Stefanie, J., Trevor, D.: Weakly supervised discovery of visual pattern configurations. In: NeurIPS, pp. 1637–1645 (2014)
Google Scholar
Oh, S.H., Ross, G., Stefanie, J., Julien, M., Zaid, H., Trevor, D.: On learning to localize objects with minimal supervision. In: ICML, pp. 1611–1619 (2014)
Google Scholar
Parthipan, S., Tao, X.: Weakly supervised object detector learning with model drift detection. In: IEEE ICCV, pp. 343–350 (2011)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS, pp. 91–99 (2015)
Google Scholar
Ren, Z., et al.: Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: IEEE CVPR, pp. 10595–10604 (2020)
Google Scholar
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: IEEE CVPR, June 2019
Google Scholar
RR, U.J., de Sande Koen EA, V., Theo, G., WM, S.A.: Selective search for object recognition. IJCV. 104(2), 154–171 (2013)
Google Scholar
Shen, Y., Ji, R., Chen, Z., Wu, Y., Huang, F.: UWSOD: toward fully-supervised-level capacity weakly supervised object detection. In: NeurIPS (2020)
Google Scholar
Shen, Y., Ji, R., Wang, C., Li, X., Li, X.: Weakly supervised object detection via object-specific pixel gradient. IEEE TNNLS 29(12), 5960–5970 (2018)
Google Scholar
Shen, Y., Ji, R., Wang, Y., Wu, Y., Cao, L.: Cyclic guidance for weakly supervised joint detection and segmentation. In: IEEE CVPR, pp. 697–707 (2019)
Google Scholar
Singh, K.K., Lee, Y.J.: You reap what you sow: using videos to generate high precision object proposals for weakly-supervised object detection. In: IEEE CVPR, pp. 9414–9422 (2019)
Google Scholar
Tang, P., et al.: PCL: proposal cluster learning for weakly supervised object detection. IEEE TPAMI 42(1), 176–191 (2020)
Google Scholar
Tang, P., Wang, X., Bai, X., Liu, W.: Multiple instance detection network with online instance classifier refinement. In: IEEE CVPR, pp. 3059–3067 (2017)
Google Scholar
Tang, P., et al.: Weakly supervised region proposal network and object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 370–386. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_22
Chapter Google Scholar
Thomas, D., Bogdan, A., Vittorio, F.: Weakly supervised localization and learning with generic knowledge. IJCV 100(3), 275–293 (2012)
Article MathSciNet Google Scholar
Touvron, H., Cord, M., Sablayrolles, A., Synnaeve, G., Jégou, H.: Going deeper with image transformers. arXiv preprint arXiv:2103.17239 (2021)
Tsung-Yi, L., Priya, G., Ross, G., Kaiming, H., Dollár, P.: Focal loss for dense object detection. In: IEEE ICCV (2017)
Google Scholar
Wan, F., Wei, P., Jiao, J., Han, Z., Ye, Q.: Min-entropy latent model for weakly supervised object detection. In: IEEE CVPR, pp. 1297–1306 (2018)
Google Scholar
Wan, F., Wei, P., Jiao, J., Han, Z., Ye, Q.: Min-entropy latent model for weakly supervised object detection. IEEE TPAMI 41(10), 2395–2409 (2019)
Article Google Scholar
Wei, Y., et al.: TS\(^{2}\)C: tight box mining with surrounding segmentation context for weakly supervised object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 454–470. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_27
Chapter Google Scholar
Ye, Q., Wan, F., Liu, C., Huang, Q., Ji, X.: Continuation multiple instance learning for weakly and fully supervised object detection. IEEE TNNLS, pp. 1–15 (2021). https://doi.org/10.1109/TNNLS.2021.3070801
Ye, Q., Zhang, T., Qiu, Q., Zhang, B., Chen, J., Sapiro, G.: Self-learning scene-specific pedestrian detectors using a progressive latent model. In: IEEE CVPR, pp. 2057–2066 (2017)
Google Scholar
Zeng, Z., Liu, B., Fu, J., Chao, H., Zhang, L.: WSOD2: learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In: IEEE ICCV (2019)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: IEEE CVPR, pp. 2921–2929 (2016)
Google Scholar

Download references

Acknowledgement

This work was supported by National Natural Science Foundation of China (NSFC) under Grant 62006216, 61836012, 62171431 and 62176260, the Strategic Priority Research Program of Chinese Academy of Sciences under Grant No. XDA27000000.

Author information

Authors and Affiliations

University of Chinese Academy of Sciences, Beijing, China
Mingxiang Liao, Fang Wan, Yuan Yao, Zhenjun Han, Jialing Zou & Qixiang Ye
Huawei Noah’s Ark Lab, Bei Jing, China
Yuze Wang, Bailan Feng & Peng Yuan

Authors

Mingxiang Liao
View author publications
You can also search for this author in PubMed Google Scholar
Fang Wan
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Yao
View author publications
You can also search for this author in PubMed Google Scholar
Zhenjun Han
View author publications
You can also search for this author in PubMed Google Scholar
Jialing Zou
View author publications
You can also search for this author in PubMed Google Scholar
Yuze Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bailan Feng
View author publications
You can also search for this author in PubMed Google Scholar
Peng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Qixiang Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang Wan .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2723 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liao, M. et al. (2022). End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13669. Springer, Cham. https://doi.org/10.1007/978-3-031-20077-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-20077-9_13
Published: 06 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20076-2
Online ISBN: 978-3-031-20077-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution