Advertisement

ML-LocNet: Improving Object Localization with Multi-view Learning Network

  • Xiaopeng Zhang
  • Yang Yang
  • Jiashi Feng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)

Abstract

This paper addresses Weakly Supervised Object Localization (WSOL) with only image-level supervision. We propose a Multi-view Learning Localization Network (ML-LocNet) by incorporating multi-view learning into a two-phase WSOL model. The multi-view learning would benefit localization due to the complementary relationships among the learned features from different views and the consensus property among the mined instances from each view. In the first phase, the representation is augmented by integrating features learned from multiple views, and in the second phase, the model performs multi-view co-training to enhance localization performance of one view with the help of instances mined from other views, which thus effectively avoids early fitting. ML-LocNet can be easily combined with existing WSOL models to further improve the localization accuracy. Its effectiveness has been proved experimentally. Notably, it achieves \(68.6\%\) CorLoc and \(49.7\%\) mAP on PASCAL VOC 2007, surpassing the state-of-the-arts by a large margin.

Keywords

Weakly supervised learning Object localization Multi-view learning Object instance mining 

Notes

Acknowledgements

The work was supported in part to Jiashi Feng by NUS IDS R-263-000-C67-646, ECRA R-263-000-C87-133 and MOE Tier-II R-263-000-D17-112, in part to Yang Yang by NSFC under Project 61572108.

References

  1. 1.
    Bilen, H., Vedaldi, A.: Weakly supervised deep detection networks. In: CVPR, pp. 2846–2854 (2016)Google Scholar
  2. 2.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Computational Learning Theory, pp. 92–100. ACM (1998)Google Scholar
  3. 3.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)Google Scholar
  4. 4.
    Cheng, J., Wang, K.: Active learning for image retrieval with Co-SVM. Pattern Recogn. 40(1), 330–334 (2007)CrossRefGoogle Scholar
  5. 5.
    Cinbis, R.G., Verbeek, J., Schmid, C.: Multi-fold mil training for weakly supervised object localization. In: CVPR, pp. 2409–2416 (2014)Google Scholar
  6. 6.
    Deselaers, T., Alexe, B., Ferrari, V.: Weakly supervised localization and learning with generic knowledge. IJCV 100(3), 275–293 (2012)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Diba, A., Sharma, V., Pazandeh, A., Pirsiavash, H., Van Gool, L.: Weakly supervised cascaded convolutional networks. In: CVPR, pp. 914–922 (2017)Google Scholar
  8. 8.
    Diba, A., Sharma, V., Stiefelhagen, R., Van Gool, L.: Object discovery by generative adversarial & ranking networks. arXiv preprint arXiv:1711.08174 (2017)
  9. 9.
    Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artifi. Intell. 89(1), 31–71 (1997)CrossRefGoogle Scholar
  10. 10.
    Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. IJCV 111(1), 98–136 (2015)CrossRefGoogle Scholar
  11. 11.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  12. 12.
    Feng, H., Shi, R., Chua, T.S.: A bootstrapping framework for annotating and retrieving www images. In: ACM Multimedia, pp. 960–967. ACM (2004)Google Scholar
  13. 13.
    Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)Google Scholar
  14. 14.
    Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 340–353. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33712-3_25CrossRefGoogle Scholar
  15. 15.
    Jie, Z., Wei, Y., Jin, X., Feng, J., Liu, W.: Deep self-taught learning for weakly supervised object localization. In: CVPR (2017)Google Scholar
  16. 16.
    Kantorov, V., Oquab, M., Cho, M., Laptev, I.: ContextLocNet: context-aware deep network models for weakly supervised localization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 350–365. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_22CrossRefGoogle Scholar
  17. 17.
    Li, D., Huang, J.B., Li, Y., Wang, S., Yang, M.H.: Weakly supervised object localization with progressive domain adaptation. In: CVPR, pp. 3512–3520 (2016)Google Scholar
  18. 18.
    Li, Y., Liu, L., Shen, C., Van Den Hengel, A.: Mining mid-level visual patterns with deep CNN activations. IJCV 121(3), 344–364 (2017)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  20. 20.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  21. 21.
    Nguyen, M.H., Torresani, L., de la Torre, F., Rother, C.: Weakly supervised discriminative localization and classification: a joint learning process. In: Proceedings of International Conference on Computer Vision, pp. 1925–1932 (2009)Google Scholar
  22. 22.
    Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Is object localization for free? - weakly-supervised learning with convolutional neural networks. In: CVPR, pp. 685–694 (2015)Google Scholar
  23. 23.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)Google Scholar
  24. 24.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  25. 25.
    Song, H.O., Lee, Y.J., Jegelka, S., Darrell, T.: Weakly-supervised discovery of visual pattern configurations. In: NIPS, pp. 1637–1645 (2014)Google Scholar
  26. 26.
    Tang, P., Wang, X., Bai, X., Liu, W.: Multiple instance detection network with online instance classifier refinement. In: CVPR, July 2017Google Scholar
  27. 27.
    Vijayanarasimhan, S., Grauman, K.: Keywords to visual categories: multiple-instance learning for weakly supervised object categorization. In: CVPR, pp. 1–8. IEEE (2008)Google Scholar
  28. 28.
    Wang, C., Ren, W., Huang, K., Tan, T.: Weakly supervised object localization with latent category learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 431–445. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_28CrossRefGoogle Scholar
  29. 29.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR, pp. 2921–2929. IEEE (2016)Google Scholar
  30. 30.
    Zhu, Y., Zhou, Y., Ye, Q., Qiu, Q., Jiao, J.: Soft proposal networks for weakly supervised object localization. arXiv preprint arXiv:1709.01829 (2017)
  31. 31.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_26CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.National University of SingaporeSingaporeSingapore
  2. 2.University of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations