Advertisement

Image Co-localization by Mimicking a Good Detector’s Confidence Score Distribution

  • Yao Li
  • Lingqiao Liu
  • Chunhua ShenEmail author
  • Anton van den Hengel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9906)

Abstract

Given a set of images containing objects from the same category, the task of image co-localization is to identify and localize each instance. This paper shows that this problem can be solved by a simple but intriguing idea, that is, a common object detector can be learnt by making its detection confidence scores distributed like those of a strongly supervised detector. More specifically, we observe that given a set of object proposals extracted from an image that contains the object of interest, an accurate strongly supervised object detector should give high scores to only a small minority of proposals, and low scores to most of them. Thus, we devise an entropy-based objective function to enforce the above property when learning the common object detector. Once the detector is learnt, we resort to a segmentation approach to refine the localization. We show that despite its simplicity, our approach outperforms state-of-the-arts.

Keywords

Image co-localization Unsupervised object discovery 

Notes

Acknowledgements

This work was in part supported by the Data to Decisions CRC Centre. C. Shen’s participation was in part supported by ARC Future Fellowship No. FT120100969.

References

  1. 1.
    Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1081–1089 (2015)Google Scholar
  2. 2.
    Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)CrossRefGoogle Scholar
  3. 3.
    Chen, X., Shrivastava, A., Gupta, A.: Enriching visual knowledge bases via object discovery and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2035–2042 (2014)Google Scholar
  4. 4.
    Cho, M., Kwak, S., Schmid, C., Ponce, J.: Unsupervised object discovery and localization in the wild: part-based matching with bottom-up region proposals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1201–1210 (2015)Google Scholar
  5. 5.
    Cinbis, R.G., Verbeek, J.J., Schmid, C.: Multi-fold MIL training for weakly supervised object localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2409–2416 (2014)Google Scholar
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  7. 7.
    Deselaers, T., Alexe, B., Ferrari, V.: Weakly supervised localization and learning with generic knowledge. Int. J. Comput. Vis. 100(3), 275–293 (2012)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Everingham, M., Eslami, S.M.A., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The PASCAL visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)CrossRefGoogle Scholar
  9. 9.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)CrossRefGoogle Scholar
  10. 10.
    Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  11. 11.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)CrossRefGoogle Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRefGoogle Scholar
  13. 13.
    Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 340–353. Springer, Heidelberg (2012)Google Scholar
  14. 14.
    Hosang, J.H., Benenson, R., Dollár, P., Schiele, B.: What makes for effective detection proposals? IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 814–830 (2016)CrossRefGoogle Scholar
  15. 15.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  16. 16.
    Joulin, A., Bach, F.R., Ponce, J.: Discriminative clustering for image co-segmentation. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, pp. 1943–1950 (2010)Google Scholar
  17. 17.
    Joulin, A., Tang, K., Fei-Fei, L.: Efficient image and video co-localization with Frank-Wolfe algorithm. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 253–268. Springer, Heidelberg (2014)Google Scholar
  18. 18.
    Krause, J., Jin, H., Yang, J., Li, F.: Fine-grained recognition without part annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5546–5555 (2015)Google Scholar
  19. 19.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1106–1114 (2012)Google Scholar
  20. 20.
    Küttel, D., Ferrari, V.: Figure-ground segmentation by transferring window masks. In: Proceedings of the EEE Conference on Computer Vision and Pattern Recognition, pp. 558–565 (2012)Google Scholar
  21. 21.
    Kwak, S., Cho, M., Ponce, J., Schmid, C., Laptev, I.: Unsupervised object discovery and tracking in video collections. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3173–3181 (2015)Google Scholar
  22. 22.
    Parkhi, O.M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: The truth about cats and dogs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1427–1434 (2011)Google Scholar
  23. 23.
    Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3282–3289 (2012)Google Scholar
  24. 24.
    Ren, W., Huang, K., Tao, D., Tan, T.: Weakly supervised large scale object localization with multiple instance learning and bag splitting. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 405–416 (2016)CrossRefGoogle Scholar
  25. 25.
    Rother, C., Kolmogorov, V., Blake, A.: GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)CrossRefGoogle Scholar
  26. 26.
    Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1939–1946 (2013)Google Scholar
  27. 27.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Li, F.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Shi, Z., Hospedales, T.M., Xiang, T.: Bayesian joint topic modelling for weakly supervised object localisation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2984–2991 (2013)Google Scholar
  29. 29.
    Siva, P., Xiang, T.: Weakly supervised object detector learning with model drift detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 343–350 (2011)Google Scholar
  30. 30.
    Tang, K., Joulin, A., Li, L., Li, F.: Co-localization in real-world images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1464–1471 (2014)Google Scholar
  31. 31.
    Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)CrossRefGoogle Scholar
  32. 32.
    Galleguillos, C., Babenko, B., Rabinovich, A., Belongie, S., Wang, C., Ren, W., Huang, K., Tan, T.: Weakly supervised object localization with latent category learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 431–445. Springer, Heidelberg (2014)Google Scholar
  33. 33.
    Wang, X., Zhu, Z., Yao, C., Bai, X.: Relaxed multiple-instance SVM with application to object discovery. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1224–1232 (2015)Google Scholar
  34. 34.
    Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Yao Li
    • 1
  • Lingqiao Liu
    • 1
  • Chunhua Shen
    • 1
    Email author
  • Anton van den Hengel
    • 1
  1. 1.School of Computer ScienceThe University of AdelaideAdelaideAustralia

Personalised recommendations