Weakly Supervised Region Proposal Network and Object Detection

  • Peng Tang
  • Xinggang Wang
  • Angtian Wang
  • Yongluan Yan
  • Wenyu LiuEmail author
  • Junzhou Huang
  • Alan Yuille
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11215)


The Convolutional Neural Network (CNN) based region proposal generation method (i.e. region proposal network), trained using bounding box annotations, is an essential component in modern fully supervised object detectors. However, Weakly Supervised Object Detection (WSOD) has not benefited from CNN-based proposal generation due to the absence of bounding box annotations, and is relying on standard proposal generation methods such as selective search. In this paper, we propose a weakly supervised region proposal network which is trained using only image-level annotations. The weakly supervised region proposal network consists of two stages. The first stage evaluates the objectness scores of sliding window boxes by exploiting the low-level information in CNN and the second stage refines the proposals from the first stage using a region-based CNN classifier. Our proposed region proposal network is suitable for WSOD, can be plugged into a WSOD network easily, and can share its convolutional computations with the WSOD network. Experiments on the PASCAL VOC and ImageNet detection datasets show that our method achieves the state-of-the-art performance for WSOD with performance gain of about \(3\%\) on average.


Object detection Region proposal Weakly supervised learning Convolutional neural network 



We really appreciate the enormous help from Yan Wang, Wei Shen, Zhishuai Zhang, Yuyin Zhou, and Baoguang Shi during the paper writing and rebuttal. This work was partly supported by NSFC (No.61733007, No.61503145, No.61572207), ONR N00014-15-1-2356, and China Scholarship Council. Xinggang Wang was sponsored by CCF-Tencent Open Research Fund, Hubei Scientific and Technical Innovation Key Project, and the Program for HUST Academic Frontier Youth Team.

Supplementary material

474198_1_En_22_MOESM1_ESM.pdf (730 kb)
Supplementary material 1 (pdf 730 KB)


  1. 1.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. TPAMI 34(11), 2189–2202 (2012)CrossRefGoogle Scholar
  2. 2.
    Bearman, A., Russakovsky, O., Ferrari, V., Fei-Fei, L.: What’s the point: semantic segmentation with point supervision. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 549–565. Springer, Cham (2016). Scholar
  3. 3.
    Bertasius, G., Shi, J., Torresani, L.: Deepedge: a multi-scale bifurcated deep network for top-down contour detection. In: CVPR, pp. 4380–4389 (2015)Google Scholar
  4. 4.
    Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: CVPR, pp. 1081–1089 (2015)Google Scholar
  5. 5.
    Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: CVPR, pp. 2846–2854 (2016)Google Scholar
  6. 6.
    Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. TPAMI 7, 1312–1328 (2011)Google Scholar
  7. 7.
    Chavali, N., Agrawal, H., Mahendru, A., Batra, D.: Object-proposal evaluation protocol is ‘gameable’. In: CVPR, pp. 835–844 (2016)Google Scholar
  8. 8.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFS. In: ICLR (2015)Google Scholar
  9. 9.
    Cinbis, R.G., Verbeek, J., Schmid, C.: Weakly supervised object localization with multi-fold multiple instance learning. TPAMI 39(1), 189–203 (2017)CrossRefGoogle Scholar
  10. 10.
    Dabkowski, P., Gal, Y.: Real time image saliency for black box classifiers. In: NIPS, pp. 6970–6979 (2017)Google Scholar
  11. 11.
    Deselaers, T., Alexe, B., Ferrari, V.: Weakly supervised localization and learning with generic knowledge. IJCV 100(3), 275–293 (2012)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Diba, A., Sharma, V., Pazandeh, A., Pirsiavash, H., Van Gool, L.: Weakly supervised cascaded convolutional networks. In: CVPR, pp. 914–922 (2017)Google Scholar
  13. 13.
    Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. TPAMI 37(8), 1558–1570 (2015)CrossRefGoogle Scholar
  14. 14.
    Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111(1), 98–136 (2015)CrossRefGoogle Scholar
  15. 15.
    Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)Google Scholar
  16. 16.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. TPAMI 38(1), 142–158 (2016)CrossRefGoogle Scholar
  17. 17.
    Hosang, J., Benenson, R., Dollár, P., Schiele, B.: What makes for effective detection proposals? TPAMI 38(4), 814–830 (2016)CrossRefGoogle Scholar
  18. 18.
    Huang, Z., Wang, X., Wang, J., Liu, W., Wang, J.: Weakly-supervised semantic segmentation network with deep seeded region growing. In: CVPR, pp. 7014–7023 (2018)Google Scholar
  19. 19.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM MM, pp. 675–678 (2014)Google Scholar
  20. 20.
    Jie, Z., Wei, Y., Jin, X., Feng, J., Liu, W.: Deep self-taught learning for weakly supervised object localization. In: CVPR, pp. 1377–1385 (2017)Google Scholar
  21. 21.
    Kantorov, V., Oquab, M., Cho, M., Laptev, I.: Contextlocnet: context-aware deep network models for weakly supervised localization. In: ECCV, pp. 350–365 (2016)Google Scholar
  22. 22.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  23. 23.
    Kuo, W., Hariharan, B., Malik, J.: Deepbox: learning objectness with convolutional networks. In: ICCV, pp. 2479–2487 (2015)Google Scholar
  24. 24.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  25. 25.
    Li, D., Huang, J.B., Li, Y., Wang, S., Yang, M.H.: Weakly supervised object localization with progressive domain adaptation. In: CVPR, pp. 3512–3520 (2016)Google Scholar
  26. 26.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  27. 27.
    Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Is object localization for free?-weakly-supervised learning with convolutional neural networks. In: CVPR, pp. 685–694 (2015)Google Scholar
  28. 28.
    Pinheiro, P.O., Lin, T.Y., Collobert, R., Dollár, P.: Learning to refine object segments. In: ECCV, pp. 75–91 (2016)Google Scholar
  29. 29.
    Pont-Tuset, J., Arbelaez, P., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping for image segmentation and object proposal generation. TPAMI 39(1), 128–140 (2017)CrossRefGoogle Scholar
  30. 30.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. TPAMI 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  31. 31.
    Ren, W., Huang, K., Tao, D., Tan, T.: Weakly supervised large scale object localization with multiple instance learning and bag splitting. TPAMI 38(2), 405–416 (2016)CrossRefGoogle Scholar
  32. 32.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Saleh, F., Aliakbarian, M.S., Salzmann, M., Petersson, L., Alvarez, J.M., Gould, S.: Incorporating network built-in priors in weakly-supervised semantic segmentation. TPAMI 40(6), 1382–1396 (2018)CrossRefGoogle Scholar
  34. 34.
    Shi, M., Caesar, H., Ferrari, V.: Weakly supervised object localization using things and stuff transfer. In: ICCV, pp. 3381–3390 (2017)Google Scholar
  35. 35.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
  36. 36.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  37. 37.
    Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., Wang, J.: Object detection in videos by short and long range object linking. arXiv preprint arXiv:1801.09823 (2018)
  38. 38.
    Tang, P., Wang, X., Bai, X., Liu, W.: Multiple instance detection network with online instance classifier refinement. In: CVPR, pp. 2843–2851 (2017)Google Scholar
  39. 39.
    Tang, P., Wang, X., Huang, Z., Bai, X., Liu, W.: Deep patch learning for weakly supervised object classification and discovery. Pattern Recogn. 71, 446–459 (2017)CrossRefGoogle Scholar
  40. 40.
    Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. IJCV 104(2), 154–171 (2013)CrossRefGoogle Scholar
  41. 41.
    Wang, C., Ren, W., Huang, K., Tan, T.: Weakly supervised object localization with latent category learning. In: ECCV, pp. 431–445 (2014)Google Scholar
  42. 42.
    Wang, X., Zhu, Z., Yao, C., Bai, X.: Relaxed multiple-instance svm with application to object discovery. In: ICCV, pp. 1224–1232 (2015)Google Scholar
  43. 43.
    Zhang, Z., Qiao, S., Xie, C., Shen, W., Wang, B., Yuille, A.L.: Single-shot object detection with enriched semantics. In: CVPR, pp. 5813–5821 (2018)Google Scholar
  44. 44.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR, pp. 2921–2929 (2016)Google Scholar
  45. 45.
    Zhu, Y., Zhou, Y., Ye, Q., Qiu, Q., Jiao, J.: Soft proposal networks for weakly supervised object localization. In: ICCV, pp. 1814–1850 (2017)Google Scholar
  46. 46.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Peng Tang
    • 1
  • Xinggang Wang
    • 1
  • Angtian Wang
    • 1
  • Yongluan Yan
    • 1
  • Wenyu Liu
    • 1
    Email author
  • Junzhou Huang
    • 2
    • 3
  • Alan Yuille
    • 4
  1. 1.School of EICHuazhong University of Science and TechnologyWuhanChina
  2. 2.Tencent AI labShenzhenChina
  3. 3.Department of CSEUniversity of Texas at ArlingtonArlingtonUSA
  4. 4.Department of Computer ScienceThe Johns Hopkins UniversityBaltimoreUSA

Personalised recommendations