Abstract
Object detection using single point supervision has received increasing attention over the years. However, the performance gap between point supervised object detection (PSOD) and bounding box supervised detection remains large. In this paper, we attribute such a large performance gap to the failure of generating high-quality proposal bags which are crucial for multiple instance learning (MIL). To address this problem, we introduce a lightweight alternative to the off-the-shelf proposal (OTSP) method and thereby create the Point-to-Box Network (P2BNet), which can construct an inter-objects balanced proposal bag by generating proposals in an anchor-like way. By fully investigating the accurate position information, P2BNet further constructs an instance-level bag, avoiding the mixture of multiple objects. Finally, a coarse-to-fine policy in a cascade fashion is utilized to improve the IoU between proposals and ground-truth (GT). Benefiting from these strategies, P2BNet is able to produce high-quality instance-level bags for object detection. P2BNet improves the mean average precision (AP) by more than 50% relative to the previous best PSOD method on the MS COCO dataset. It also demonstrates the great potential to bridge the performance gap between point supervised and bounding-box supervised detectors. The code will be released at www.github.com/ucas-vg/P2BNet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arbeláez, P.A., Pont-Tuset, J., et al.: Multiscale combinatorial grouping. In: CVPR (2014)
Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: CVPR (2016)
Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 421–436. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_25
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, K., Wang, J., Pang, J.E.: MMDetection: open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Chen, Z., Fu, Z., et al.: SLV: spatial likelihood voting for weakly supervised object detection. In: CVPR (2020)
Cheng, B., Parkhi, O., Kirillov, A.: Pointly-supervised instance segmentation. CoRR (2021)
Diba, A., Sharma, V., et al.: Weakly supervised cascaded convolutional networks. In: CVPR (2017)
Ding, J., Xue, N., Long, Y., Xia, G., Lu, Q.: Learning RoI transformer for oriented object detection in aerial images. In: CVPR (2019)
Everingham, M., Gool, L.V., et al.: The pascal visual object classes (VOC) challenge. In: IJCV (2010)
Gao, M., Li, A., et al.: C-WSL: count-guided weakly supervised localization. In: ECCV (2018)
Ge, W., Yang, S., Yu, Y.: Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. In: CVPR (2018)
Girshick, R.B.: Fast R-CNN. In: ICCV (2015)
Guo, Z., Liu, C., Zhang, X., Jiao, J., Ji, X., Ye, Q.: Beyond bounding-box: convex-hull feature adaptation for oriented and densely packed object detection. In: CVPR (2021)
He, K., Gkioxari, G., et al.: Mask R-CNN. In: ICCV (2017)
He, K., Zhang, X., et al.: Deep residual learning for image recognition. In: CVPR (2016)
Huang, Z., Zou, Y., et al.: Comprehensive attention self-distillation for weakly-supervised object detection. In: NeurIPS (2020)
Jia, Q., Wei, S., et al.: Gradingnet: towards providing reliable supervisions for weakly supervised object detection by grading the box candidates. In: AAAI (2021)
Jiang, N., et al.: Anti-UAV: a large multi-modal benchmark for UAV tracking. IEEE TMM (2021)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Lee, P., Byun, H.: Learning action completeness from points for weakly-supervised temporal action localization. In: ICCV (2021)
Lin, T., Dollár, P., et al.: Feature pyramid networks for object detection. In: CVPR (2017)
Lin, T., Goyal, P., et al.: Focal loss for dense object detection. In: ICCV (2017)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Liu, Z., Lin, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)
Meng, M., Zhang, T., Yang, W., Zhao, J., Zhang, Y., Wu, F.: Diverse complementary part mining for weakly supervised object localization. IEEE TIP 31, 1774–1788 (2022)
Papadopoulos, D.P., Uijlings, J.R.R., et al.: Training object class detectors with click supervision. In: CVPR (2017)
Redmon, J., Divvala, S.K., et al.: You only look once: unified, real-time object detection. In: CVPR (2016)
Ren, S., He, K., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE TPAMI 39(6), 1137–1149 (2017)
Ren, Z., Yu, Z., et al.: Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In: CVPR (2020)
Ren, Z., Yu, Z., Yang, X., Liu, M.-Y., Schwing, A.G., Kautz, J.: UFO\(^2\): a unified framework towards omni-supervised object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 288–313. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_18
Ribera, J., Guera, D., Chen, Y., Delp, E.J.: Locating objects without bounding boxes. In: CVPR (2019)
van de Sande, K.E.A., Uijlings, J.R.R., et al.: Segmentation as selective search for object recognition. In: ICCV (2011)
Shen, Y., Ji, R., Chen, Z., Wu, Y., Huang, F.: UWSOD: toward fully-supervised-level capacity weakly supervised object detection. In: NeurIPS (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Song, Q., et al.: Rethinking counting and localization in crowds: a purely point-based framework. In: ICCV (2021)
Sun, P., Zhang, R., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: CVPR (2021)
Tang, P., et al.: Multiple instance detection network with online instance classifier refinement. In: CVPR (2017)
Tang, P., Wang, X., et al.: PCL: proposal cluster learning for weakly supervised object detection. IEEE TPAMI 42(1), 176–191 (2020)
Wan, F., Wei, P., et al.: Min-entropy latent model for weakly supervised object detection. IEEE TPAMI 41(10), 2395–2409 (2019)
Yan, G., Liu, B., et al.: C-MIDN: coupled multiple instance detection network with segmentation guidance for weakly supervised object detection. In: ICCV (2019)
Yang, X., Yan, J., Feng, Z., He, T.: R3Det: refined single-stage detector with feature refinement for rotating object. In: AAAI (2021)
Yang, Z., Liu, S., et al.: Reppoints: point set representation for object detection. In: ICCV (2019)
Yu, X., Chen, P., et al.: Object localization under single coarse point supervision. In: CVPR (2022)
Yu, X., Gong, Y., et al.: Scale match for tiny person detection. In: IEEE WACV (2020)
Zeng, Z., Liu, B., et al.: WSOD2: learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In: ICCV (2019)
Zhang, D., Han, J., Cheng, G., Yang, M.: Weakly supervised object localization and detection: a survey. IEEE TPAMI 44(9), 5866–5885 (2021)
Zhang, X., Wei, Y., et al.: Adversarial complementary learning for weakly supervised object localization. In: CVPR (2018)
Zhao, J., et al.: The 2nd anti-UAV workshop & challenge: methods and results. In: ICCVW 2021 (2021)
Zhou, B., Khosla, A., et al.: Learning deep features for discriminative localization. In: CVPR (2016)
Zhu, X., Su, W., et al.: Deformable DETR: deformable transformers for end-to-end object detection. In: ICLR (2021)
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26
Acknowledgements
This work was supported in part by the Youth Innovation Promotion Association CAS, the National Natural Science Foundation of China (NSFC) under Grant No. 61836012, 61771447 and 62006244, the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No.XDA27000000, and Young Elite Scientist Sponsorship Program of China Association for Science and Technology YESS20200140.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, P. et al. (2022). Point-to-Box Network for Accurate Object Detection via Single Point Supervision. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13669. Springer, Cham. https://doi.org/10.1007/978-3-031-20077-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-20077-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20076-2
Online ISBN: 978-3-031-20077-9
eBook Packages: Computer ScienceComputer Science (R0)