Object Detection with a Unified Label Space from Multiple Datasets

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12359)


Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces. The practical benefits of such an object detector are obvious and significant—application-relevant categories can be picked and merged form arbitrary existing datasets. However, naïve merging of datasets is not possible in this case, due to inconsistent object annotations. Consider an object category like faces that is annotated in one dataset, but is not annotated in another dataset, although the object itself appears in the latter’s images. Some categories, like face here, would thus be considered foreground in one dataset, but background in another. To address this challenge, we design a framework which works with such partial annotations, and we exploit a pseudo labeling approach that we adapt for our specific case. We propose loss functions that carefully integrate partial but correct annotations with complementary but noisy pseudo labels. Evaluation in the proposed novel setting requires full annotation on the test set. We collect the required annotations (Project page: This work was part of Xiangyun Zhao’s internship at NEC Labs America.) and define a new challenging experimental setup for this task based on existing public datasets. We show improved performances compared to competitive baselines and appropriate adaptations of existing work.



This work was supported in part by National Science Foundation grant IIS-1619078, IIS-1815561.

Supplementary material

504468_1_En_11_MOESM1_ESM.pdf (3.1 mb)
Supplementary material 1 (pdf 3160 KB)


  1. 1.
    Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: CVPR (2016)Google Scholar
  2. 2.
    Bilen, H., Vedaldi, A.: Universal representations: the missing link between faces, text, planktons, and cat breeds. arXiv:1701.07275 (2017)
  3. 3.
    Chen, Y., Li, W., Sakaridis, C., Dai, D., Van Gool, L.: Domain adaptive faster R-CNN for object detection in the wild. In: CVPR (2018)Google Scholar
  4. 4.
    Cour, T., Sapp, B., Taskar, B.: Learning from partial labels. JMLR 12, 1501–1536 (2011)Google Scholar
  5. 5.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  6. 6.
    Everingham, M., Gool, L.V., Williams, C.K.I., Winn., J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)Google Scholar
  7. 7.
    Feng, L., An, B.: Partial label learning with self-guided retraining. In: AAAI (2019)Google Scholar
  8. 8.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  9. 9.
    Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: CVPR (2019)Google Scholar
  10. 10.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  11. 11.
    Hsu, H.K., et al.: Progressive domain adaptation for object detection. In: WACV (2020)Google Scholar
  12. 12.
    Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K.: Cross-domain weakly-supervised object detection through progressive domain adaptation. In: CVPR (2018)Google Scholar
  13. 13.
    Kalluri, T., Varma, G., Chandraker, M., Jawahar, C.: Universal semi-supervised semantic segmentation. In: ICCV (2019)Google Scholar
  14. 14.
    Kantorov, V., Oquab, M., Cho, M., Laptev, I.: ContextLocNet: context-aware deep network models for weakly supervised localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 350–365. Springer, Cham (2016). Scholar
  15. 15.
    Kuznetsova, A., et al.: The open images dataset v4: unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982 (2018)
  16. 16.
    Lambert, J., Liu, Z., Sener, O., Hays, J., Koltun, V.: MSeg: a composite dataset for multi-domain semantic segmentation. In: CVPR (2020)Google Scholar
  17. 17.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)Google Scholar
  18. 18.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  19. 19.
    Møgelmose, A., Trivedi, M.M., Moeslund, T.B.: Vision based traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Trans. Intell. Transp. Syst. 13(4), 1484–1497 (2012)Google Scholar
  20. 20.
    Papadopoulos, D.P., Uijlings, J.R.R., Keller, F., Ferrari, V.: Training object class detectors with click supervision. In: CVPR (2017)Google Scholar
  21. 21.
    Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: NIPS (2017)Google Scholar
  22. 22.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)Google Scholar
  23. 23.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  24. 24.
    Russakovsky, O., Li, L.J., Fei-Fei, L.: Best of both worlds: human-machine collaboration for object annotation. In: CVPR (2015)Google Scholar
  25. 25.
    Singh, B., Davis, L.S.: An analysis of scale invariance in object detection - SNIP. In: CVPR (2018)Google Scholar
  26. 26.
    Singh, B., Li, H., Sharma, A., Davis, L.S.: R-FCN-3000 at 30fps: decoupling detection and classification. In: CVPR (2018)Google Scholar
  27. 27.
    Song, S., Lichtenberg, S.P., Xiao, J.: SUN RGB-D: a RGB-D scene understanding benchmark suite. In: CVPR (2015)Google Scholar
  28. 28.
    Uijlings, J., Popov, S., Ferrari, V.: Revisiting knowledge transfer for training object class detectors. In: CVPR (2018)Google Scholar
  29. 29.
    Wan, F., Liu, C., Ke, W., Ji, X., Jiao, J., Ye, Q.: C-MIL: continuation multiple instance learning for weakly supervised object detection. In: CVPR (2019)Google Scholar
  30. 30.
    Wan, F., Wei, P., Jiao, J., Han, Z., Ye, Q.: Min-entropy latent model for weakly supervised object detection. In: CVPR (2018)Google Scholar
  31. 31.
    Wang, X., Cai, Z., Gao, D., Vasconcelos, N.: Towards universal object detection by domain attention. In: CVPR (2019)Google Scholar
  32. 32.
    Wang, Z., Acuna, D., Ling, H., Kar, A., Fidler, S.: Object instance annotation with deep extreme level set evolution. In: CVPR (2019)Google Scholar
  33. 33.
    Yang, H., Wu, H., Chen, H.: Detecting 11K classes: large scale object detection without fine-grained bounding boxes. In: ICCV (2019)Google Scholar
  34. 34.
    Yang, S., Luo, P., Loy, C.C., Tang, X.: WIDER FACE: a face detection benchmark. In: CVPR (2016)Google Scholar
  35. 35.
    Yao, A., Gall, J., Leistner, C., Gool, L.V.: Interactive object detection. In: CVPR (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Northwestern UniversityEvanstonUSA
  2. 2.NEC Labs AmericaPrincetonUSA
  3. 3.UC San DiegoSan DiegoUSA

Personalised recommendations