Skip to main content

Robust Object Detection with Inaccurate Bounding Boxes

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13670))

Included in the following conference series:

Abstract

Learning accurate object detectors often requires large-scale training data with precise object bounding boxes. However, labeling such data is expensive and time-consuming. As the crowd-sourcing labeling process and the ambiguities of the objects may raise noisy bounding box annotations, the object detectors will suffer from the degenerated training data. In this work, we aim to address the challenge of learning robust object detectors with inaccurate bounding boxes. Inspired by the fact that localization precision suffers significantly from inaccurate bounding boxes while classification accuracy is less affected, we propose leveraging classification as a guidance signal for refining localization results. Specifically, by treating an object as a bag of instances, we introduce an Object-Aware Multiple Instance Learning approach (OA-MIL), featured with object-aware instance selection and object-aware instance extension. The former aims to select accurate instances for training, instead of directly using inaccurate box annotations. The latter focuses on generating high-quality instances for selection. Extensive experiments on synthetic noisy datasets (i.e., noisy PASCAL VOC and MS-COCO) and a real noisy wheat head dataset demonstrate the effectiveness of our OA-MIL. Code is available at https://github.com/cxliu0/OA-MIL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.kaggle.com/c/global-wheat-detection.

  2. 2.

    https://www.aicrowd.com/challenges/global-wheat-challenge-2021.

References

  1. Bernhard, M., Schubert, M.: Correcting imprecise object locations for training object detectors in remote sensing applications. Remote Sens. 13(24), 4962 (2021)

    Google Scholar 

  2. Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: CVPR, pp. 1081–1089 (2015)

    Google Scholar 

  3. Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: CVPR, pp. 2846–2854 (2016)

    Google Scholar 

  4. Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018)

    Google Scholar 

  5. Chadwick, S., Newman, P.: Training object detectors with noisy data. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pp. 1319–1325 (2019)

    Google Scholar 

  6. Chen, K., et al.: MMDetection: open MMLab detection toolbox and benchmark. arXiv (2019)

    Google Scholar 

  7. Cinbis, R.G., Verbeek, J., Schmid, C.: Multi-fold mil training for weakly supervised object localization. In: CVPR, pp. 2409–2416 (2014)

    Google Scholar 

  8. David, E., et al.: Global wheat head detection (GWHD) dataset: a large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods. Plant Phenomics 2020 (2020)

    Google Scholar 

  9. David, E., et al.: Global wheat head detection 2021: an improved dataset for benchmarking wheat head detection methods. Plant Phenomics 2021 (2021)

    Google Scholar 

  10. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)

    Google Scholar 

  11. Deselaers, T., Alexe, B., Ferrari, V.: Localizing objects while learning their appearance. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 452–466. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_33

    Chapter  Google Scholar 

  12. Diba, A., Sharma, V., Pazandeh, A., Pirsiavash, H., Van Gool, L.: Weakly supervised cascaded convolutional networks. In: CVPR, pp. 5131–5139 (2017)

    Google Scholar 

  13. Dietterich, T., Lathrop, R., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89, 31–71 (1997)

    Google Scholar 

  14. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)

    Article  Google Scholar 

  15. Gao, J., Wang, J., Dai, S., Li, L., Nevatia, R.: Note-RCNN: noise tolerant ensemble RCNN for semi-supervised object detection. In: CVPR, pp. 9507–9516 (2019)

    Google Scholar 

  16. Ghosh, A., Kumar, H., Sastry, P.S.: Robust loss functions under label noise for deep neural networks. In: AAAI, pp. 1919–1925 (2017)

    Google Scholar 

  17. Girshick, R., Radosavovic, I., Gkioxari, G., Dollár, P., He, K.: Detectron (2018). https://github.com/facebookresearch/detectron

  18. Han, B., et al.: Co-teaching: robust training of deep neural networks with extremely noisy labels. In: NeurIPS, pp. 8536–8546 (2018)

    Google Scholar 

  19. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  20. He, Y., Zhu, C., Wang, J., Savvides, M., Zhang, X.: Bounding box regression with uncertainty for accurate object detection. In: CVPR, pp. 2883–2892 (2019)

    Google Scholar 

  21. Jiang, L., Zhou, Z., Leung, T., Li, L., Fei-Fei, L.: Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: ICML, pp. 2309–2318 (2018)

    Google Scholar 

  22. Kantorov, V., Oquab, M., Cho, M., Laptev, I.: ContextLocNet: context-aware deep network models for weakly supervised localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 350–365. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_22

    Chapter  Google Scholar 

  23. Kuznetsova, A., et al.: The open images dataset V4. IJCV 128(7), 1956–1981 (2020)

    Article  Google Scholar 

  24. Li, D., Huang, J., Li, Y., Wang, S., Yang, M.: Weakly supervised object localization with progressive domain adaptation. In: CVPR, pp. 3512–3520 (2016)

    Google Scholar 

  25. Li, J., Xiong, C., Socher, R., Hoi, S.C.H.: Towards noise-resistant object detection with noisy annotations. arXiv abs/2003.01285 (2020)

    Google Scholar 

  26. Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)

    Google Scholar 

  27. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2999–3007 (2017)

    Google Scholar 

  28. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  29. Liu, C., Wang, K., Lu, H., Cao, Z.: Dynamic color transform for wheat head detection. In: ICCVW, pp. 1278–1283 (2021)

    Google Scholar 

  30. Liu, C., Wang, K., Lu, H., Cao, Z.: Dynamic color transform networks for wheat head detection. Plant Phenomics 2022 (2022)

    Google Scholar 

  31. Lu, H., Dai, Y., Shen, C., Xu, S.: Index networks. IEEE TPAMI 44(1), 242–255 (2022)

    Article  Google Scholar 

  32. Ma, X., et al.: Dimensionality-driven learning with noisy labels. In: ICML, pp. 3355–3364 (2018)

    Google Scholar 

  33. Mao, J., Yu, Q., Aizawa, K.: Noisy localization annotation refinement for object detection. In: ICIP, pp. 2006–2010 (2020)

    Google Scholar 

  34. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE TPAMI 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  35. Siva, P., Xiang, T.: Weakly supervised object detector learning with model drift detection. In: ICCV, pp. 343–350 (2011)

    Google Scholar 

  36. Siva, P., Russell, C., Xiang, T.: In defence of negative mining for annotating weakly labelled data. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 594–608. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_43

    Chapter  Google Scholar 

  37. Song, H., Kim, M., Lee, J.G.: SELFIE: refurbishing unclean samples for robust deep learning. In: ICML, pp. 5907–5915 (2019)

    Google Scholar 

  38. Song, H.O., Lee, Y.J., Jegelka, S., Darrell, T.: Weakly-supervised discovery of visual pattern configurations. In: NeurIPS, pp. 1637–1645 (2014)

    Google Scholar 

  39. Tang, P., Wang, X., Bai, X., Liu, W.: Multiple instance detection network with online instance classifier refinement. In: CVPR, pp. 3059–3067 (2017)

    Google Scholar 

  40. Tang, Y., Wang, J., Gao, B., Dellandréa, E., Gaizauskas, R., Chen, L.: Large scale semi-supervised object detection using visual and semantic knowledge transfer. In: CVPR, pp. 2119–2128 (2016)

    Google Scholar 

  41. Tang, Y., et al.: Visual and semantic knowledge transfer for large scale semi-supervised object detection. IEEE TPAMI 40(12), 3045–3058 (2018)

    Article  Google Scholar 

  42. Uijlings, J.R.R., Popov, S., Ferrari, V.: Revisiting knowledge transfer for training object class detectors. In: CVPR, pp. 1101–1110 (2018)

    Google Scholar 

  43. Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)

    Article  Google Scholar 

  44. Wan, F., Wei, P., Jiao, J., Han, Z., Ye, Q.: Min-entropy latent model for weakly supervised object detection. In: CVPR, pp. 1297–1306 (2018)

    Google Scholar 

  45. Wan, F., Liu, C., Ke, W., Ji, X., Jiao, J., Ye, Q.: C-MIL: continuation multiple instance learning for weakly supervised object detection. In: CVPR, pp. 2199–2208 (2019)

    Google Scholar 

  46. Wei, Y., et al.: TS2C: tight box mining with surrounding segmentation context for weakly supervised object detection. In: ECCV, pp. 454–470 (2018)

    Google Scholar 

  47. Xu, Y., Zhu, L., Yang, Y., Wu, F.: Training robust object detectors from noisy category labels and imprecise bounding boxes. IEEE TIP 30, 5782–5792 (2021)

    Google Scholar 

  48. Zhang, X., Yang, Y., Feng, J.: Learning to localize objects with noisy labeled instances. In: AAAI, pp. 9219–9226 (2019)

    Google Scholar 

  49. Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. In: NeurIPS, pp. 147–155 (2019)

    Google Scholar 

  50. Zhang, Z., Sabuncu, M.R.: Generalized cross entropy loss for training deep neural networks with noisy labels. In: NeurIPS, pp. 8792–8802 (2018)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China under Grant No. 61876211, No. U1913602, and No. 62106080.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiguo Cao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1471 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, C., Wang, K., Lu, H., Cao, Z., Zhang, Z. (2022). Robust Object Detection with Inaccurate Bounding Boxes. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13670. Springer, Cham. https://doi.org/10.1007/978-3-031-20080-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20080-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20079-3

  • Online ISBN: 978-3-031-20080-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics