Skip to main content
Log in

Semi-Supervised and Long-Tailed Object Detection with CascadeMatch

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper focuses on long-tailed object detection in the semi-supervised learning setting, which poses realistic challenges, but has rarely been studied in the literature. We propose a novel pseudo-labeling-based detector called CascadeMatch. Our detector features a cascade network architecture, which has multi-stage detection heads with progressive confidence thresholds. To avoid manually tuning the thresholds, we design a new adaptive pseudo-label mining mechanism to automatically identify suitable values from data . To mitigate confirmation bias, where a model is negatively reinforced by incorrect pseudo-labels produced by itself, each detection head is trained by the ensemble pseudo-labels of all detection heads. Experiments on two long-tailed datasets, i.e., LVIS and COCO-LT, demonstrate that CascadeMatch surpasses existing state-of-the-art semi-supervised approaches—across a wide range of detection architectures—in handling long-tailed object detection. For instance, CascadeMatch outperforms Unbiased Teacher by 1.9 \(\hbox {AP}^{{\text {Fix}}}\) on LVIS when using a ResNet50-based Cascade R-CNN structure, and by 1.7 \(\hbox {AP}^{{\text {Fix}}}\) when using Sparse R-CNN with a Transformer encoder. We also show that CascadeMatch can even handle the challenging sparsely annotated object detection problem. Code: https://github.com/yuhangzang/CascadeMatch.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

The datasets analysed during this study are all publicly available for the research purpose - the LVIS and COCO datasets.

Notes

  1. For simplicity, we use a single proposal in our formulations, which can be easily extended to a batch of proposals.

  2. With a slight abuse of notation, \({\varvec{b}}_1\) in \(p_2(y | {\varvec{x}}, {\varvec{b}}_1)\) contains the complete coordinates of the bounding box rather than the regressed offsets.

References

  • Arazo, E., Ortego, D., Albert, P., O’Connor, N. E. & McGuinness, K. (2020). Pseudo-labeling and confirmation bias in deep semi-supervised learning. In IJCNN.

  • Bachman, P., Alsharif, O. & Precup, D. (2014). Learning with pseudo-ensembles. In NeurIPS.

  • Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A. & Raffel, C. (2019). MixMatch: a holistic approach to semi-supervised learning. In NeurIPS.

  • Berthelot, D., Carlini, N., Cubuk, E. D., Kurakin, A., Sohn, K., Zhang, H. & Raffel, C. (2020). Remixmatch: semi-supervised learning with distribution alignment and augmentation anchoring. In ICLR.

  • Cai, Z. & Vasconcelos, N. (2019). Cascade R-CNN: high quality object detection and instance segmentation. TPAMI.

  • Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A. & Zagoruyko, S. (2020). End-to-end object detection with transformers. In ECCV.

  • Chang, N., Yu, Z., Wang, Y. X., Anandkumar, A., Fidler, S., & Alvarez, J. M. (2021). Image-level or object-level? a tale of two resampling strategies for long-tailed detection. In ICML.

  • Chen, B., Chen, W., Yang, S., Xuan, Y., Song, J., Xie, D., Pu, S., Song, M. & Zhuang, Y. (2022a). Label matching semi-supervised object detection. In CVPR.

  • Chen, B., Li, P., Chen, X., Wang, B., Zhang, L., Hua, X. S. (2022b). Dense learning based semi-supervised object detection. In CVPR.

  • Dave, A., Dollár, P., Ramanan, D., Kirillov, A. & Girshick, R. (2021). Evaluating large-vocabulary object detectors: the devil is in the details. arXiv preprint arXiv:2102.01066.

  • Fan, Y., Dai, D., Kukleva, A. & Schiele, B. (2022). Cossl: co-learning of representation and classifier for imbalanced semi-supervised learning. In CVPR.

  • Feng, C., Zhong, Y. & Huang, W. (2021). Exploring classification equilibrium in long-tailed object detection. In ICCV.

  • Gao, J., Wang, J., Dai, S., Li, L. J. & Nevatia, R. (2019). NOTE-RCNN: noise tolerant ensemble rcnn for semi-supervised object detection. In CVPR.

  • Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T. Y., Cubuk, E. D., Le, Q. V. & Zoph, B. (2021). Simple copy-paste is a strong data augmentation method for instance segmentation. In CVPR.

  • Guo, Q., Mu, Y., Chen, J., Wang, T., Yu, Y. & Luo, P. (2022). Scale-equivalent distillation for semi-supervised object detection. In CVPR.

  • Gupta, A., Dollar, P. & Girshick, R. (2019). LVIS: a dataset for large vocabulary instance segmentation. In CVPR.

  • Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I. & Sugiyama, M. (2018). Co-teaching: robust training of deep neural networks with extremely noisy labels. In NeurIPS.

  • He, K., Zhang, X., Ren, S. & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

  • He, R., Yang, J. & Qi, X. (2021). Re-distributing biased pseudo labels for semi-supervised semantic segmentation: a baseline investigation. In ICCV.

  • He, Y. Y., Zhang, P., Wei, X. S., Zhang, X. & Sun, J. (2022). Relieving long-tailed instance segmentation via pairwise class balance. In CVPR.

  • Hu, H., Wei, F., Hu, H., Ye, Q., Cui, J. & Wang, L. (2021). Semi-supervised semantic segmentation via adaptive equalization learning. In NeurIPS.

  • Hu, X., Jiang, Y., Tang, K., Chen, J., Miao, C. & Zhang, H. (2020). Learning to segment the tail. In CVPR.

  • Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al. (2017). Speed/accuracy trade-offs for modern convolutional object detectors. In CVPR.

  • Hyun, M., Jeong, J. & Kwak, N. (2020). Class-imbalanced semi-supervised learning. arXiv preprint arXiv:2002.06815.

  • Iscen, A., Tolias, G., Avrithis, Y. & Chum, O. (2019). Label propagation for deep semi-supervised learning. In CVPR.

  • Jeong, J., Lee, S., Kim, J. & Kwak, N. (2019). Consistency-based semi-supervised learning for object detection. In NeurIPS.

  • Jeong, J., Verma, V., Hyun, M., Kannala, J. & Kwak, N. (2021). Interpolation-based semi-supervised learning for object detection. In CVPR.

  • Kim, J., Hur, Y., Park, S., Yang, E., Hwang, S. J. & Shin, J. (2020). Distribution aligning refinery of pseudo-label for imbalanced semi-supervised learning. In NeurIPS.

  • Laine, S. & Aila, T. (2017). Temporal ensembling for semi-supervised learning. In ICLR.

  • Lee, D. H. (2013). Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In ICML Workshops.

  • Lee, H., Shin, S. & Kim, H. (2021). Abc: auxiliary balanced classifier for class-imbalanced semi-supervised learning. In NeurIPS.

  • Li, A., Yuan, P. & Li, Z. (2022a). Semi-supervised object detection via multi-instance alignment with global class prototypes. In CVPR.

  • Li, B., Yao, Y., Tan, J., Zhang, G., Yu, F., Lu, J. & Luo, Y. (2022b). Equalized focal loss for dense long-tailed object detection. In CVPR.

  • Li, H., Wu, Z., Shrivastava, A. & Davis, L. S. (2022c). Rethinking pseudo labels for semi-supervised object detection. In AAAI.

  • Li, S., Gong, K., Liu, C. H., Wang, Y., Qiao, F. & Cheng, X. (2021). MetaSAug: meta semantic augmentation for long-tailed visual recognition. In CVPR.

  • Li, Y., Huang, D., Qin, D., Wang, L. & Gong, B. (2020a). Improving object detection with selective self-supervised self-training. In ECCV.

  • Li, Y., Wang, T., Kang, B., Tang, S., Wang, C., Li, J. & Feng, J. (2020b). Overcoming classifier imbalance for long-tail object detection with balanced group softmax. In CVPR.

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C. L. (2014) Microsoft COCO: Common objects in context. In ECCV.

  • Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B. & Belongie, S. (2017a). Feature pyramid networks for object detection. In CVPR.

  • Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. (2017b). Focal loss for dense object detection. In ICCV.

  • Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y. & Berg, A. C. (2016). SSD: Single shot multibox detector. In ECCV.

  • Liu, Y. C., Ma, C. Y., He, Z., Kuo, C. W., Chen, K., Zhang, P., Wu, B., Kira, Z. & Vajda, P. (2021a). Unbiased teacher for semi-supervised object detection. In ICLR.

  • Liu, Y. C., Ma, C. Y. & Kira, Z. (2022). Unbiased teacher v2: semi-supervised object detection for anchor-free and anchor-based detectors. In CVPR.

  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S. & Guo, B. (2021b). Swin transformer: hierarchical vision transformer using shifted windows. In ICCV.

  • Mi, P., Lin, J., Zhou, Y., Shen, Y., Luo, G., Sun, X., Cao, L., Fu, R., Xu, Q. & Ji, R. (2022). Active teacher for semi-supervised object detection. In CVPR.

  • Misra, I., Shrivastava, A. & Hebert, M. (2015). Watch and learn: semi-supervised learning for object detectors from video. In CVPR.

  • Oh, Y., Kim, D. J. & Kweon, I. S. (2022). Distribution-aware semantics-oriented pseudo-label for imbalanced semi-supervised learning. In CVPR.

  • Qiao, S., Shen, W., Zhang, Z., Wang, B. & Yuille, A. (2018). Deep co-training for semi-supervised image recognition. In ECCV.

  • Rasmus, A., Valpola, H., Honkala, M., Berglund, M. & Raiko, T. (2016). Semi-supervised learning with ladder networks. In NeurIPS.

  • Ren, J., Yu, C., Sheng, S., Ma, X., Zhao, H., Yi, S. & Li, H. (2020). Balanced meta-softmax for long-tailed visual recognition. In NeurIPS.

  • Ren, S., He, K., Girshick, R. B. & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In NeurIPS.

  • Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I. & Savarese, S. (2019). Generalized intersection over union: a metric and a loss for bounding box regression. In CVPR.

  • Rizve, M. N., Duarte, K., Rawat, Y. S. & Shah, M. (2021). In defense of pseudo-labeling: an uncertainty-aware pseudo-label selection framework for semi-supervised learning. In ICLR.

  • Rosenberg, C., Hebert, M. & Schneiderman, H. (2005). Semi-supervised self-training of object detection models. In WACV.

  • RoyChowdhury, A., Chakrabarty, P., Singh, A., Jin, S., Jiang, H., Cao, L. & Learned-Miller, E. (2019). Automatic adaptation of object detectors to new domains using self-training. In CVPR.

  • Sajjadi, M., Javanmardi, M. & Tasdizen, T. (2016). Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In NeurIPS.

  • Shen, L., Lin, Z. & Huang, Q. (2016). Relay backpropagation for effective learning of deep convolutional neural networks. In ECCV.

  • Sohn, K., Berthelot, D., Carlini, N., Zhang, Z., Zhang, H., Raffel, C. A., Cubuk, E. D., Kurakin, A. & Li, C. L. (2020a). FixMatch: simplifying semi-supervised learning with consistency and confidence. In NeurIPS.

  • Sohn, K., Zhang, Z., Li, C. L., Zhang, H., Lee, C. Y. & Pfister, T. (2020b). A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757

  • Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., et al. (2021). Sparse R-CNN: end-to-end object detection with learnable proposals. In CVPR.

  • Tan, J., Wang, C., Li, B., Li, Q., Ouyang, W., Yin, C. & Yan, J. (2020). Equalization loss for long-tailed object recognition. In CVPR.

  • Tan, J., Lu, X., Zhang, G., Yin, C. & Li, Q. (2021). Equalization loss v2: a new gradient balance approach for long-tailed object detection. In CVPR.

  • Tang, P., Ramaiah, C., Wang, Y., Xu, R. & Xiong, C. (2021a). Proposal learning for semi-supervised object detection. In WACV.

  • Tang, Y., Wang, J., Gao, B., Dellandréa, E., Gaizauskas, R. & Chen, L. (2016). Large scale semi-supervised object detection using visual and semantic knowledge transfer. In CVPR.

  • Tang, Y., Chen, W., Luo, Y. & Zhang, Y. (2021b). Humble teachers teach better students for semi-supervised object detection. In CVPR.

  • Tarvainen, A. & Valpola, H. (2017). Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS.

  • Tian, Z., Shen, C., Chen, H. & He, T. (2019). FCOS: fully convolutional one-stage object detection. In ICCV.

  • Wang, J., Zhang, W., Zang, Y., Cao, Y., Pang, J., Gong, T., Chen, K., Liu, Z., Loy, C. C. & Lin, D. (2021a) Seesaw loss for long-tailed instance segmentation. In CVPR.

  • Wang, K., Yan, X., Zhang, D., Zhang, L. & Lin, L. (2018). Towards human-machine cooperation: self-supervised sample mining for object detection. In CVPR.

  • Wang, T., Li, Y., Kang, B., Li, J., Liew, J., Tang, S., Hoi, S. & Feng, J. (2020). The devil is in classification: a simple framework for long-tail instance segmentation. In ECCV.

  • Wang, T., Yang, T., Cao, J. & Zhang, X. (2021b). Co-mining: self-supervised learning for sparsely annotated object detection. In AAAI.

  • Wang, T., Zhu, Y., Zhao, C., Zeng, W., Wang, J. & Tang, M. (2021c). Adaptive class suppression loss for long-tail object detection. In CVPR.

  • Wang, W., Xie, E., Li, X., Fan, D. P., Song, K., Liang, D., Lu, T., Luo, P. & Shao, L. (2021d). Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In ICCV.

  • Wang, Z., Li, Y., Guo, Y., Fang, L. & Wang, S. (2021e). Data-uncertainty guided multi-phase learning for semi-supervised object detection. In CVPR.

  • Wei, C., Sohn, K., Mellina, C., Yuille, A. & Yang, F. (2021). CReST: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In CVPR.

  • Wu, J., Song, L., Wang, T., Zhang, Q. & Yuan, J. (2020). Forest R-CNN: large-vocabulary long-tailed object detection and instance segmentation. In ACM MM.

  • Wu, Z., Bodla, N., Singh, B., Najibi, M., Chellappa, R. & Davis, L. S. (2019). Soft sampling for robust object detection. In BMVC.

  • Xie, Q., Dai, Z., Hovy, E., Luong, T. & Le, Q. (2020a). Unsupervised data augmentation for consistency training. In NeurIPS.

  • Xie, Q., Luong, M. T., Hovy, E. & Le, Q. V. (2020b). Self-training with noisy student improves imagenet classification. In CVPR.

  • Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X. & Liu, Z. (2021). End-to-end semi-supervised object detection with soft teacher. In ICCV.

  • Yang, F., Wu, K., Zhang, S., Jiang, G., Liu, Y., Zheng, F., Zhang, W., Wang, C. & Zeng, L. (2022). Class-aware contrastive semi-supervised learning. In CVPR.

  • Yang, Q., Wei, X., Wang, B., Hua, X. S. & Zhang, L. (2021). Interactive self-training with mean teachers for semi-supervised object detection. In CVPR.

  • Yang, Y. & Xu, Z. (2020). Rethinking the value of labels for improving class-imbalanced learning. In NeurIPS.

  • Zang, Y., Huang, C. & Loy, C. C. (2021). Fasa: feature augmentation and sampling adaptation for long-tailed instance segmentation. In ICCV.

  • Zhang, C., Pan, T. Y., Li, Y., Hu, H., Xuan, D., Changpinyo, S., Gong, B. & Chao, W. L. (2021a). Mosaicos: a simple and effective use of object-centric images for long-tailed object detection. In ICCV.

  • Zhang, F., Pan, T. & Wang, B. (2022). Semi-supervised object detection with adaptive class-rebalancing self-training. In AAAI.

  • Zhang, H., Chen, F., Shen, Z., Hao, Q., Zhu, C. & Savvides, M. (2020). Solving missing-annotation object detection with background recalibration loss. In ICASSP.

  • Zhang, S., Li, Z., Yan, S., He, X. & Sun, J. (2021b). Distribution alignment: a unified framework for long-tail visual recognition. In CVPR.

  • Zhang, Y., Kang, B., Hooi, B., Yan, S. & Feng, J. (2021c). Deep long-tailed learning: a survey. arXiv preprint arXiv:2110.04596.

  • Zheng, M., You, S., Huang, L., Wang, F., Qian, C. & Xu, C. (2022). Simmatch: semi-supervised learning with similarity matching. In CVPR.

  • Zhou, Q., Yu, C., Wang, Z., Qian, Q. & Li, H. (2021a). Instant-teaching: an end-to-end semi-supervised object detection framework. In CVPR.

  • Zhou, X., Koltun, V. & Krähenbühl, P. (2021b). Probabilistic two-stage detection. In CVPR.

Download references

Acknowledgements

This study is supported under the RIE2020 Industry Alignment Fund Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). It is also partly supported by the NTU NAP grant and Singapore MOE AcRF Tier 2 (MOE-T2EP20120-0001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Change Loy.

Additional information

Communicated by Bumsub Ham.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zang, Y., Zhou, K., Huang, C. et al. Semi-Supervised and Long-Tailed Object Detection with CascadeMatch. Int J Comput Vis 131, 987–1001 (2023). https://doi.org/10.1007/s11263-022-01738-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01738-x

Keywords

Navigation