TS\(^{2}\)C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection

  • Yunchao WeiEmail author
  • Zhiqiang Shen
  • Bowen Cheng
  • Honghui Shi
  • Jinjun Xiong
  • Jiashi Feng
  • Thomas Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11215)


This work provides a simple approach to discover tight object bounding boxes with only image-level supervision, called Tight box mining with Surrounding Segmentation Context (TS2C). We observe that object candidates mined through current multiple instance learning methods are usually trapped to discriminative object parts, rather than the entire object. TS2C leverages surrounding segmentation context derived from weakly-supervised segmentation to suppress such low-quality distracting candidates and boost the high-quality ones. Specifically, TS2C is developed based on two key properties of desirable bounding boxes: (1) high purity, meaning most pixels in the box are with high object response, and (2) high completeness, meaning the box covers high object response pixels comprehensively. With such novel and computable criteria, more tight candidates can be discovered for learning a better object detector. With TS2C, we obtain 48.0% and 44.4% mAP scores on VOC 2007 and 2012 benchmarks, which are the new state-of-the-arts.


Weakly-supervised learning Object detection Semantic segmentation 



This work is in part supported by IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR) - a research collaboration as part of the IBM AI Horizons Network, NUS IDS R-263-000-C67-646, ECRA R-263-000-C87-133, MOE Tier-II R-263-000-D17-112 and the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DOI/IBC) contract number D17PC00341. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government.


  1. 1.
    Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with posterior regularization. In: BMVC, pp. 1–12 (2014)Google Scholar
  2. 2.
    Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: IEEE CVPR, pp. 1081–1089 (2015)Google Scholar
  3. 3.
    Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: IEEE CVPR, pp. 2846–2854 (2016)Google Scholar
  4. 4.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. preprint arXiv:1412.7062 (2014)
  5. 5.
    Chen, X., et al.: 3D object proposals for accurate object class detection. In: NIPS, pp. 424–432 (2015)Google Scholar
  6. 6.
    Cheng, B., Wei, Y., Shi, H., Feris, R., Xiong, J., Huang, T.: Revisiting RCNN: on awakening the classification power of faster RCNN. In: ECCV (2018)Google Scholar
  7. 7.
    Cinbis, R.G., Verbeek, J., Schmid, C.: Weakly supervised object localization with multi-fold multiple instance learning. IEEE TPAMI 39(1), 189–203 (2017)CrossRefGoogle Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE CVPR, pp. 248–255 (2009)Google Scholar
  9. 9.
    Deselaers, T., Alexe, B., Ferrari, V.: Weakly supervised localization and learning with generic knowledge. IJCV 100(3), 275–293 (2012)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Diba, A., Sharma, V., Pazandeh, A., Pirsiavash, H., Van Gool, L.: Weakly supervised cascaded convolutional networks. In: IEEE CVPR (2017)Google Scholar
  11. 11.
    Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111(1), 98–136 (2014)CrossRefGoogle Scholar
  12. 12.
    Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: IEEE ICCV, pp. 1134–1142 (2015)Google Scholar
  13. 13.
    Girshick, R.: Fast R-CNN. In: IEEE ICCV, pp. 1440–1448 (2015)Google Scholar
  14. 14.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE CVPR, pp. 580–587 (2014)Google Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). Scholar
  16. 16.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678 (2014)Google Scholar
  17. 17.
    Jie, Z., Wei, Y., Jin, X., Feng, J., Liu, W.: Deep self-taught learning for weakly supervised object localization. In: IEEE CVPR (2017)Google Scholar
  18. 18.
    Kantorov, V., Oquab, M., Cho, M., Laptev, I.: ContextLocNet: context-aware deep network models for weakly supervised localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 350–365. Springer, Cham (2016). Scholar
  19. 19.
    Kolesnikov, A., Lampert, C.H.: Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In: ECCV, pp. 695–711 (2016)Google Scholar
  20. 20.
    Lai, B., Gong, X.: Saliency guided end-to-end learning for weakly supervised object detection. In: IJCAI (2017)Google Scholar
  21. 21.
    Li, D., Huang, J.B., Li, Y., Wang, S., Yang, M.H.: Weakly supervised object localization with progressive domain adaptation. In: IEEE CVPR, pp. 3512–3520 (2016)Google Scholar
  22. 22.
    Li, J., et al.: Attentive contexts for object detection. IEEE Trans. Multimedia 19(5), 944–954 (2017)CrossRefGoogle Scholar
  23. 23.
    Liang, X., Liu, S., Wei, Y., Liu, L., Lin, L., Yan, S.: Towards computational baby learning: a weakly-supervised approach for object detection. In: IEEE ICCV, pp. 999–1007 (2015)Google Scholar
  24. 24.
    Lin, M., Chen, Q., Yan, S.: Network in network. In: ICLR (2013)Google Scholar
  25. 25.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). Scholar
  26. 26.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE CVPR (2015)Google Scholar
  27. 27.
    Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Is object localization for free?-weakly-supervised learning with convolutional neural networks. In: IEEE CVPR, pp. 685–694 (2015)Google Scholar
  28. 28.
    Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: IEEE CVPR (2017)Google Scholar
  29. 29.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  30. 30.
    Shen, Z., Liu, Z., Li, J., Jiang, Y.G., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: IEEE ICCV (2017)Google Scholar
  31. 31.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)Google Scholar
  32. 32.
    Tang, P., Wang, X., Bai, X., Liu, W.: Multiple instance detection network with online instance classifier refinement. In: IEEE CVPR (2017)Google Scholar
  33. 33.
    Teh, E.W., Rochan, M., Wang, Y.: Attention networks for weakly supervised object localization. In: BMVC (2016)Google Scholar
  34. 34.
    Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)CrossRefGoogle Scholar
  35. 35.
    Wang, C., Ren, W., Huang, K., Tan, T.: Weakly supervised object localization with latent category learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 431–445. Springer, Cham (2014). Scholar
  36. 36.
    Wei, Y., Feng, J., Liang, X., Cheng, M.M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: IEEE CVPR (2017)Google Scholar
  37. 37.
    Wei, Y., et al.: Learning to segment with image-level annotations. Pattern Recogn. (2016)Google Scholar
  38. 38.
    Wei, Y., et al.: STC: a simple to complex framework for weakly-supervised semantic segmentation. IEEE TPAMI (2016)Google Scholar
  39. 39.
    Wei, Y., et al.: HCP: a flexible cnn framework for multi-label image classification. IEEE TPAMI 38(9), 1901–1907 (2016)CrossRefGoogle Scholar
  40. 40.
    Wei, Y., Xiao, H., Shi, H., Jie, Z., Feng, J., Huang, T.S.: Revisiting dilated convolution: a simple approach for weakly-and semi-supervised semantic segmentation. In: IEEE CVPR, pp. 7268–7277Google Scholar
  41. 41.
    Xiao, H., Feng, J., Wei, Y., Zhang, M., Yan, S.: Deep salient object detection with dense connections and distraction diagnosis. IEEE Trans. Multimedia (2018)Google Scholar
  42. 42.
    Zhang, X., Wei, Y., Feng, J., Yang, Y., Huang, T.: Adversarial complementary learning for weakly supervised object localization. In: IEEE CVPR (2018)Google Scholar
  43. 43.
    Zhang, X., Wei, Y., Kang, G., Yang, Y., Huang, T.: Self-produced guidance for weakly-supervised object localization. In: ECCV (2018)Google Scholar
  44. 44.
    Zhao, F., Li, J., Zhao, J., Feng, J.: Weakly supervised phrase localization with multi-scale anchored transformer network. In: IEEE CVPR, pp. 5696–5705 (2018)Google Scholar
  45. 45.
    Zhou, B., Khosla, A.A.L., Oliva, A., Torralba, A.: Learning Deep Features for Discriminative Localization. IEEE CVPR (2016)Google Scholar
  46. 46.
    Zhu, Y., Urtasun, R., Salakhutdinov, R., Fidler, S.: segDeepM: exploiting segmentation and context in deep neural networks for object detection. In: IEEE CVPR, pp. 4703–4711 (2015)Google Scholar
  47. 47.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yunchao Wei
    • 1
    Email author
  • Zhiqiang Shen
    • 1
    • 2
  • Bowen Cheng
    • 1
  • Honghui Shi
    • 3
  • Jinjun Xiong
    • 3
  • Jiashi Feng
    • 4
  • Thomas Huang
    • 1
  1. 1.University of Illinois at UrbanaChampaignUrbanaUSA
  2. 2.Fudan UniversityShanghaiChina
  3. 3.IBM T.J. Watson Research CenterYorktown HeightsUSA
  4. 4.National University of SingaporeSingaporeSingapore

Personalised recommendations