Skip to main content

End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13669))

Included in the following conference series:

Abstract

Conventional methods for weakly supervised object detection (WSOD) typically enumerate dense proposals and select the discriminative proposals as objects. However, these two-stage “enumerate-and-select” methods suffer object feature ambiguity brought by dense proposals and low detection efficiency caused by the proposal enumeration procedure. In this study, we propose a sparse proposal evolution (SPE) approach, which advances WSOD from the two-stage pipeline with dense proposals to an end-to-end framework with sparse proposals. SPE is built upon a visual transformer equipped with a seed proposal generation (SPG) branch and a sparse proposal refinement (SPR) branch. SPG generates high-quality seed proposals by taking advantage of the cascaded self-attention mechanism of the visual transformer, and SPR trains the detector to predict sparse proposals which are supervised by the seed proposals in a one-to-one matching fashion. SPG and SPR are iteratively performed so that seed proposals update to accurate supervision signals and sparse proposals evolve to precise object regions. Experiments on VOC and COCO object detection datasets show that SPE outperforms the state-of-the-art end-to-end methods by 7.0% mAP and 8.1% AP50. It is an order of magnitude faster than the two-stage methods, setting the first solid baseline for end-to-end WSOD with sparse proposals. The code is available at https://github.com/MingXiangL/SPE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alex, K., Ilya, S., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NeurIPS, pp. 1097–1115 (2012)

    Google Scholar 

  2. Arbeláez, P.A., Pont-Tuset, J., Barron, J.T., Marqués, F., Malik, J.: Multiscale combinatorial grouping. In: IEEE CVPR, pp. 328–335 (2014)

    Google Scholar 

  3. Arun, A., Jawahar, C.V., Kumar, M.P.: Dissimilarity coefficient based weakly supervised object detection. In: IEEE CVPR, pp. 9432–9441 (2019)

    Google Scholar 

  4. Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with posterior regularization. In: BMVC, pp. 1997–2005 (2014)

    Google Scholar 

  5. Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: IEEE CVPR, pp. 1081–1089 (2015)

    Google Scholar 

  6. Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: IEEE CVPR. pp. 2846–2854 (2016)

    Google Scholar 

  7. Cao, T., Du, L., Zhang, X., Chen, S., Zhang, Y., Wang, Y.: Cat: Weakly supervised object detection with category transfer (2021)

    Google Scholar 

  8. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

  9. Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE TPAMI 34(7), 1312–1328 (2012)

    Google Scholar 

  10. Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. IEEE TPAMI 34(7), 1312–1328 (2012)

    Google Scholar 

  11. Cheng, G., Yang, J., Gao, D., Guo, L., Han, J.: High-quality proposals for weakly supervised object detection. IEEE TIP 29, 5794–5804 (2020)

    Google Scholar 

  12. Cheng, M., Zhang, Z., Lin, W., Torr, P.H.S.: BING: binarized normed gradients for objectness estimation at 300fps. In: IEEE CVPR, pp. 3286–3293 (2014)

    Google Scholar 

  13. Chong, W., Kaiqi, H., Weiqiang, R., Junge, Z., Steve, M.: Large-scale weakly supervised object localization via latent category learning. IEEE TIP 24(4), 1371–1385 (2015)

    Google Scholar 

  14. Wang, C., Ren, W., Huang, K., Tan, T.: Weakly supervised object localization with latent category learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 431–445. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_28

  15. Diba, A., Sharma, V., Pazandeh, A., Pirsiavash, H., Van Gool, L.: Weakly supervised cascaded convolutional networks. In: IEEE CVPR, pp. 5131–5139 (2017)

    Google Scholar 

  16. Diba, A., Sharma, V., Stiefelhagen, R., Van Gool, L.: Object discovery by generative adversarial & ranking networks. arXiv preprint arXiv:1711.08174 (2017)

  17. Dong, B., Huang, Z., Guo, Y., Wang, Q., Niu, Z., Zuo, W.: Boosting weakly supervised object detection via learning bounding box adjusters. In: IEEE ICCV (2021)

    Google Scholar 

  18. Dong, L., Bin, H.J., Yali, L., Shengjin, W., Hsuan, Y.M.: Weakly supervised object localization with progressive domain adaptation. In: IEEE CVPR, pp. 3512–3520 (2016)

    Google Scholar 

  19. Fang, W., Chang, L., Wei, K., Xiangyang, J., Jianbin, J., Qixiang, Y.: CMIL: continuation multiple instance learning for weakly supervised object detection. In: IEEE CVPR (2019)

    Google Scholar 

  20. Gao, W., et al.: TS-CAM: token semantic coupled attention map for weakly supervised object localization. CoRR abs/2103.14862 (2021)

    Google Scholar 

  21. Gao, Y., et al.: C-MIDN: coupled multiple instance detection network with segmentation guidance for weakly supervised object detection. In: IEEE ICCV (2019)

    Google Scholar 

  22. Gudi, A., van Rosmalen, N., Loog, M., van Gemert, J.C.: Object-extent pooling for weakly supervised single-shot localization. In: BMVC (2017)

    Google Scholar 

  23. Huang, Z., Zou, Y., Kumar, B.V.K.V., Huang, D.: Comprehensive attention self-distillation for weakly-supervised object detection. In: NeurIPS (2020)

    Google Scholar 

  24. Kantorov, V., et al.: Deep self-taught learning for weakly supervised object localization. In: IEEE CVPR, pp. 4294–4302 (2017)

    Google Scholar 

  25. ContextLocNet: context-aware deep network models for weakly supervised localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 350–365. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_22

  26. Kosugi, S., Yamasaki, T., Aizawa, K.: Object-aware instance labeling for weakly supervised object detection. In: IEEE ICCV (2019)

    Google Scholar 

  27. Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26

  28. Li, X., Kan, M., Shan, S., Chen, X.: Weakly supervised object detection with segmentation collaboration. In: IEEE ICCV (2019)

    Google Scholar 

  29. Mark, E., Luc, V.G., KI, W.C., John, W., Andrew, Z.: The pascal visual object classes (VOC) challenge. IJCV. 88(2), 303–338 (2010)

    Google Scholar 

  30. Meng, D., et al.: Conditional DETR for fast training convergence. In: IEEE ICCV, pp. 3651–3660, October 2021

    Google Scholar 

  31. Oh, S.H., Jae, L.Y., Stefanie, J., Trevor, D.: Weakly supervised discovery of visual pattern configurations. In: NeurIPS, pp. 1637–1645 (2014)

    Google Scholar 

  32. Oh, S.H., Ross, G., Stefanie, J., Julien, M., Zaid, H., Trevor, D.: On learning to localize objects with minimal supervision. In: ICML, pp. 1611–1619 (2014)

    Google Scholar 

  33. Parthipan, S., Tao, X.: Weakly supervised object detector learning with model drift detection. In: IEEE ICCV, pp. 343–350 (2011)

    Google Scholar 

  34. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS, pp. 91–99 (2015)

    Google Scholar 

  35. Ren, Z., et al.: Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: IEEE CVPR, pp. 10595–10604 (2020)

    Google Scholar 

  36. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: IEEE CVPR, June 2019

    Google Scholar 

  37. RR, U.J., de Sande Koen EA, V., Theo, G., WM, S.A.: Selective search for object recognition. IJCV. 104(2), 154–171 (2013)

    Google Scholar 

  38. Shen, Y., Ji, R., Chen, Z., Wu, Y., Huang, F.: UWSOD: toward fully-supervised-level capacity weakly supervised object detection. In: NeurIPS (2020)

    Google Scholar 

  39. Shen, Y., Ji, R., Wang, C., Li, X., Li, X.: Weakly supervised object detection via object-specific pixel gradient. IEEE TNNLS 29(12), 5960–5970 (2018)

    Google Scholar 

  40. Shen, Y., Ji, R., Wang, Y., Wu, Y., Cao, L.: Cyclic guidance for weakly supervised joint detection and segmentation. In: IEEE CVPR, pp. 697–707 (2019)

    Google Scholar 

  41. Singh, K.K., Lee, Y.J.: You reap what you sow: using videos to generate high precision object proposals for weakly-supervised object detection. In: IEEE CVPR, pp. 9414–9422 (2019)

    Google Scholar 

  42. Tang, P., et al.: PCL: proposal cluster learning for weakly supervised object detection. IEEE TPAMI 42(1), 176–191 (2020)

    Google Scholar 

  43. Tang, P., Wang, X., Bai, X., Liu, W.: Multiple instance detection network with online instance classifier refinement. In: IEEE CVPR, pp. 3059–3067 (2017)

    Google Scholar 

  44. Tang, P., et al.: Weakly supervised region proposal network and object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 370–386. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_22

    Chapter  Google Scholar 

  45. Thomas, D., Bogdan, A., Vittorio, F.: Weakly supervised localization and learning with generic knowledge. IJCV 100(3), 275–293 (2012)

    Article  MathSciNet  Google Scholar 

  46. Touvron, H., Cord, M., Sablayrolles, A., Synnaeve, G., Jégou, H.: Going deeper with image transformers. arXiv preprint arXiv:2103.17239 (2021)

  47. Tsung-Yi, L., Priya, G., Ross, G., Kaiming, H., Dollár, P.: Focal loss for dense object detection. In: IEEE ICCV (2017)

    Google Scholar 

  48. Wan, F., Wei, P., Jiao, J., Han, Z., Ye, Q.: Min-entropy latent model for weakly supervised object detection. In: IEEE CVPR, pp. 1297–1306 (2018)

    Google Scholar 

  49. Wan, F., Wei, P., Jiao, J., Han, Z., Ye, Q.: Min-entropy latent model for weakly supervised object detection. IEEE TPAMI 41(10), 2395–2409 (2019)

    Article  Google Scholar 

  50. Wei, Y., et al.: TS\(^{2}\)C: tight box mining with surrounding segmentation context for weakly supervised object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 454–470. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_27

    Chapter  Google Scholar 

  51. Ye, Q., Wan, F., Liu, C., Huang, Q., Ji, X.: Continuation multiple instance learning for weakly and fully supervised object detection. IEEE TNNLS, pp. 1–15 (2021). https://doi.org/10.1109/TNNLS.2021.3070801

  52. Ye, Q., Zhang, T., Qiu, Q., Zhang, B., Chen, J., Sapiro, G.: Self-learning scene-specific pedestrian detectors using a progressive latent model. In: IEEE CVPR, pp. 2057–2066 (2017)

    Google Scholar 

  53. Zeng, Z., Liu, B., Fu, J., Chao, H., Zhang, L.: WSOD2: learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In: IEEE ICCV (2019)

    Google Scholar 

  54. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: IEEE CVPR, pp. 2921–2929 (2016)

    Google Scholar 

Download references

Acknowledgement

This work was supported by National Natural Science Foundation of China (NSFC) under Grant 62006216, 61836012, 62171431 and 62176260, the Strategic Priority Research Program of Chinese Academy of Sciences under Grant No. XDA27000000.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fang Wan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2723 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liao, M. et al. (2022). End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13669. Springer, Cham. https://doi.org/10.1007/978-3-031-20077-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20077-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20076-2

  • Online ISBN: 978-3-031-20077-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics