Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection

Zhou, Hongyu; Ge, Zheng; Liu, Songtao; Mao, Weixin; Li, Zeming; Yu, Haiyan; Sun, Jian

doi:10.1007/978-3-031-20077-9_3

Hongyu Zhou^12,14,
Zheng Ge¹²,
Songtao Liu¹²,
Weixin Mao^12,13,
Zeming Li¹²,
Haiyan Yu¹⁴ &
…
Jian Sun¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13669))

Included in the following conference series:

European Conference on Computer Vision

3320 Accesses
20 Citations

Abstract

To date, the most powerful semi-supervised object detectors (SS-OD) are based on pseudo-boxes, which need a sequence of post-processing with fine-tuned hyper-parameters. In this work, we propose replacing the sparse pseudo-boxes with the dense prediction as a united and straightforward form of pseudo-label. Compared to the pseudo-boxes, our Dense Pseudo-Label (DPL) does not involve any post-processing method, thus retaining richer information. We also introduce a region selection technique to highlight the key information while suppressing the noise carried by dense labels. We name our proposed SS-OD algorithm that leverages the DPL as Dense Teacher. On COCO and VOC, Dense Teacher shows superior performance under various settings compared with the pseudo-box-based methods. Code is available at https://github.com/Megvii-BaseDetection/DenseTeacher.

H. Zhou and Z. Ge—Authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See also in Sect. 3.2 for a related discussion.
2.
CrowdHuman is a benchmark for detecting humans in a crowded situation, performance is measured by Log-average Miss Rate (mMR). The lower the better.
3.
“Anchor” stands for “anchor point” in anchor-free detectors and “anchor box” in anchor-based detectors.
4.
COCO20cls is the sampled COCO train2017 set, only 20 classes same as in VOC are included.
5.
Since the “unlabeled images” under the COCO-Standard setting actually come with annotations, we can perform label assignments on images using these annotations. The difference between our division (\(k=1\)) and the assigned foreground is defined as hard negatives.

References

Bachman, P., Alsharif, O., Precup, D.: Learning with pseudo-ensembles. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Bachman, P., Alsharif, O., Precup, D.: Learning with pseudo-ensembles. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Berthelot, D., et al.: Remixmatch: semi-supervised learning with distribution alignment and augmentation anchoring. arXiv preprint arXiv:1911.09785 (2019)
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: a holistic approach to semi-supervised learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC 2012) Results. www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Ge, Z., Hu, C., Huang, X., Qiu, B., Yoshie, O.: Dualbox: generating bbox pair with strong correspondence via occlusion pattern clustering and proposal refinement. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 2097–2102. IEEE (2021)
Google Scholar
Ge, Z., Jie, Z., Huang, X., Xu, R., Yoshie, O.: PS-RCNN: detecting secondary human instances in a crowd via primary object suppression. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Jeong, J., Lee, S., Kim, J., Kwak, N.: Consistency-based semi-supervised learning for object detection. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)
Lee, D.H., et al.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks (2013)
Google Scholar
Li, H., Wu, Z., Shrivastava, A., Davis, L.S.: Rethinking pseudo labels for semi-supervised object detection. arXiv preprint arXiv:2106.00168 (2021)
Li, X., et al.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv. Neural. Inf. Process. Syst. 33, 21002–21012 (2020)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, Y.C., et al.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021)
Miyato, T., Maeda, S.I., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1979–1993 (2018)
Article Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Sajjadi, M., Javanmardi, M., Tasdizen, T.: Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Shao, S., et al.: Crowdhuman: a benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018)
Sohn, K., et al.: Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv. Neural. Inf. Process. Syst. 33, 596–608 (2020)
Google Scholar
Sohn, K., Zhang, Z., Li, C.L., Zhang, H., Lee, C.Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Tang, Y., Chen, W., Luo, Y., Zhang, Y.: Humble teachers teach better students for semi-supervised object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3132–3141 (2021)
Google Scholar
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)
Google Scholar
Xie, Q., Luong, M.T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698 (2020)
Google Scholar
Xu, M., et al.: End-to-end semi-supervised object detection with soft teacher. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3060–3069 (2021)
Google Scholar
Yang, Q., Wei, X., Wang, B., Hua, X.S., Zhang, L.: Interactive self-training with mean teachers for semi-supervised object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5941–5950 (2021)
Google Scholar
Zhang, F., Pan, T., Wang, B.: Semi-supervised object detection with adaptive class-rebalancing self-training. arXiv preprint arXiv:2107.05031 (2021)
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)
Google Scholar
Zhixing, D., Zhang, R., Chang, M., Liu, S., Chen, T., Chen, Y., et al.: Distilling object detectors with feature richness. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Zhou, Q., Yu, C., Wang, Z., Qian, Q., Li, H.: Instant-teaching: an end-to-end semi-supervised object detection framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4081–4090 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

MEGVII Technology, Beijing, China
Hongyu Zhou, Zheng Ge, Songtao Liu, Weixin Mao, Zeming Li & Jian Sun
Waseda University, Tokyo, Japan
Weixin Mao
Harbin Institute of Technology, Harbin, China
Hongyu Zhou & Haiyan Yu

Authors

Hongyu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Ge
View author publications
You can also search for this author in PubMed Google Scholar
Songtao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Weixin Mao
View author publications
You can also search for this author in PubMed Google Scholar
Zeming Li
View author publications
You can also search for this author in PubMed Google Scholar
Haiyan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haiyan Yu .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, H. et al. (2022). Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13669. Springer, Cham. https://doi.org/10.1007/978-3-031-20077-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-20077-9_3
Published: 06 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20076-2
Online ISBN: 978-3-031-20077-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dense Teacher: Dense Pseudo-Labels for Semi-supervised Object Detection