Skip to main content
Log in

Weakly Supervised Localization and Learning with Generic Knowledge

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Learning a new object class from cluttered training images is very challenging when the location of object instances is unknown, i.e. in a weakly supervised setting. Many previous works require objects covering a large portion of the images. We present a novel approach that can cope with extensive clutter as well as large scale and appearance variations between object instances. To make this possible we exploit generic knowledge learned beforehand from images of other classes for which location annotation is available. Generic knowledge facilitates learning any new class from weakly supervised images, because it reduces the uncertainty in the location of its object instances. We propose a conditional random field that starts from generic knowledge and then progressively adapts to the new class. Our approach simultaneously localizes object instances while learning an appearance model specific for the class. We demonstrate this on several datasets, including the very challenging Pascal VOC 2007. Furthermore, our method allows training any state-of-the-art object detector in a weakly supervised fashion, although it would normally require object location annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. This differs from the setting in the previous version of this work (Deselaers et al. 2010), where we used a smaller subset of images selected by Chum and Zisserman (2007), which are considerably easier as most of them contain a large dominant object.

  2. Baseline suggested by C. Lampert in personal communication.

  3. http://www.di.ens.fr/~russell/projects/mult_seg_discovery/index.html

  4. Unfortunately, we could not obtain the source code from Chum and Zisserman (2007). We asked them to process our Pascal07-6x2 training sets and they confirmed that their method performs poorly on them.

  5. Derived from the PR plots in their paper (Fig. 5).

  6. The source code is available at http://people.cs.uchicago.edu/~pff/latent/.

References

  • Alexe, B., Deselaers, T., & Ferrari, V. (2010a). ClassCut for unsupervised class segmentation. In ECCV.

    Google Scholar 

  • Alexe, B., Deselaers, T., & Ferrari, V. (2010b). What is an object? In CVPR.

    Google Scholar 

  • Alexe, B., Deselaers, T., & Ferrari, V. (2012). Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. In NIPS.

    Google Scholar 

  • Arora, H., Loeff, N., Forsyth, D., & Ahuja, N. (2007). Unsupervised segmentation of objects using efficient learning. In CVPR.

    Google Scholar 

  • Babenko, B., Branson, S., & Belongie, S. (2009). Similarity metrics for categorization: From monolithic to category specific. In ICCV.

    Google Scholar 

  • Bagon, S., Brostovski, O., Galun, M., & Irani, M. (2010). Detecting and sketching the common. In CVPR.

    Google Scholar 

  • Bay, H., Ess, A., Tuytelaars, T., & van Gool, L. (2008). SURF: speeded up robust features. In CVIU.

    Google Scholar 

  • Blaschko, B., Vedaldi, A., & Zisserman, A. (2010). Simultaneous object detection and ranking with weak supervision. In NIPS.

    Google Scholar 

  • Borenstein, E., & Ullman, S. (2004). Learning to segment. In ECCV.

    Google Scholar 

  • Cao, L., & Li, F. F. (2007). Spatially coherent latent topic model for concurrent segmentation and classification of objects and scene. In ICCV.

    Google Scholar 

  • Carreira, J., Li, F., & Sminchisescu, C. (2010). Constrained parametric min cuts for automatic object segmentation. In CVPR.

    Google Scholar 

  • Chen, Y., Bi, J., & Wang, J. Z. (2006). MILES: multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 1931–1947.

    Article  Google Scholar 

  • Chum, O., & Zisserman, A. (2007). An exemplar model for learning object classes. In CVPR.

    Google Scholar 

  • Crandall, D. J., & Huttenlocher, D. (2006). Weakly supervised learning of part-based spatial models for visual object recognition. In ECCV.

    Google Scholar 

  • Dalal, N., & Triggs, B. (2005). Histogram of Oriented Gradients for human detection. In CVPR.

    Google Scholar 

  • Deselaers, T., & Ferrari, V. (2010). A conditional random field for multiple-instance learning. In ICML.

    Google Scholar 

  • Deselaers, T., Alexe, B., & Ferrari, V. (2010). Localizing objects while learning their appearance. In ECCV.

    Google Scholar 

  • Dorkó, G., & Schmid, C. (2005). Object class recognition using discriminative local features. Tech. Rep. RR-5497, INRIA, Rhone-Alpes.

  • Endres, I., & Hoiem, D. (2010). Category independent object proposals. In ECCV.

    Google Scholar 

  • Everingham, M., Van Gool, L., Williams, C. K. I., & Zisserman, A. (2006). The PASCAL Visual Object Classes Challenge 2006 (VOC2006). http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2006/.

  • Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2007). The PASCAL Visual Object Classes Challenge 2007 Results.

  • Everingham, M., et al. (2010). The PASCAL Visual Object Classes Challenge 2010 Results.

  • Fei-Fei, L., Fergus, R., & Perona, P. (2003). A bayesian approach to unsupervised one-shot learning of object categories. In ICCV (pp. 1134–1141).

    Google Scholar 

  • Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In CVPR workshop of generative model based vision.

    Google Scholar 

  • Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.

    Article  Google Scholar 

  • Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR.

    Google Scholar 

  • Finley, T., & Joachims, T. (2008). Training structural svms when exact inference is intractable. In ICML.

    Google Scholar 

  • Fritz, M., & Schiele, B. (2006). Towards unsupervised discovery of visual categories. In DAGM.

    Google Scholar 

  • Frome, A., Singer, Y., Sha, F., & Malik, J. (2007). Learning globally-consistent local distance functions for shape-based image retrieval and classification. In ICCV.

    Google Scholar 

  • Gaidon, A., Marszalek, M., & Schmid, C. (2009). Mining visual actions from movies. In BMVC.

    Google Scholar 

  • Galleguillos, C., Babenko, B., Rabinovich, A., & Belongie, S. (2008). Weakly supervised object localization with stable segmentations. In ECCV.

    Google Scholar 

  • Grauman, K., & Darrell, T. (2006). Unsupervised learning of categories from sets of partially matching image features. In CVPR.

    Google Scholar 

  • Kim, G., & Torralba, A. (2009). Unsupervised detection of regions of interest using iterative link analysis. In NIPS.

    Google Scholar 

  • Kolmogorov, V. (2006a). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583.

    Article  Google Scholar 

  • Kolmogorov, V. (2006b). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583.

    Article  Google Scholar 

  • Lampert, C., Nickisch, H., & Harmeling, S. (2009a). Learning to detect unseen object classes by between-class attribute transfer. In CVPR.

    Google Scholar 

  • Lampert, C. H., Blaschko, M. B., & Hofmann, T. (2009b). Efficient subwindow search: A branch and bound framework for object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2129–2142.

    Article  Google Scholar 

  • Lando, M., & Edelman, S. (1995). Generalization from a single view in face recognition. (technical report cs-tr 95-02). The Weizmann Institute of Science.

  • Lee, Y., & Grauman, K. (2009a). Shape discovery from unlabeled image collections. In CVPR.

    Google Scholar 

  • Lee, Y. J., & Grauman, K. (2009b). Foreground focus: unsupervised learning from partially matching images. International Journal of Computer Vision, 85, 143–166.

    Article  Google Scholar 

  • Malisiewicz, T., & Efros, A. A. (2008). Recognition by association via learning per-exemplar distances. In CVPR.

    Google Scholar 

  • Nguyen, M., Torresani, L., de la Torre, F., & Rother, C. (2009). Weakly supervised discriminative localization and classification: a joint learning process. In ICCV.

    Google Scholar 

  • Nowak, E., & Jurie, F. (2007). Learning visual similarity measures for comparing never seen objects. In CVPR.

    Google Scholar 

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.

    Article  MATH  Google Scholar 

  • Payet, N., & Todorovic, S. (2010). From a set of shapes to object discovery. In ECCV.

    Google Scholar 

  • Quattoni, A., Collins, M., & Darrell, T. (2008). Transfer learning for image classification with sparse prototype representations. In CVPR.

    Google Scholar 

  • Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. (2007). Self-taught learning: transfer learning from unlabeled data. In ICML.

    Google Scholar 

  • Ramanan, D. (2006). Learning to parse images of articulated bodies. In NIPS.

    Google Scholar 

  • Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I., & Schiele, B. (2010). What helps where—and why? semantic relatedness for knowledge transfer. In CVPR.

    Google Scholar 

  • Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: interactive foreground extraction using iterated graph cuts. Computer Graphics, 23(3), 309–314.

    Google Scholar 

  • Russel, B. C., & Torralba, A. (2008). LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173.

    Article  Google Scholar 

  • Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In CVPR.

    Google Scholar 

  • Stark, M., Goesele, M., & Schiele, B. (2009). A shape-based object class model for knowledge transfer. In ICCV.

    Google Scholar 

  • Szummer, M., Kohli, P., & Hoiem, D. (2008). Learning CRFs using graph cuts. In ECCV.

    Google Scholar 

  • Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In NIPS.

    Google Scholar 

  • Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In CVPR.

    Google Scholar 

  • Tommasi, T., & Caputo, B. (2009). The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In BMVC.

    Google Scholar 

  • Tommasi, T., Orabona, F., & Caputo, B. (2010). Safety in numbers: learning categories from few examples with multi model knowledge transfer. In CVPR.

    Google Scholar 

  • Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In ECCV.

    Google Scholar 

  • Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.

    MathSciNet  MATH  Google Scholar 

  • Viola, P. A., Platt, J., & Zhang, C. (2005). Multiple instance boosting for object detection. In NIPS.

    Google Scholar 

  • Weinberger, K. Q., Blitzer, J., & Saul, L. K. (2005). Distance metric learning for large margin nearest neighbor classification. In NIPS.

    Google Scholar 

  • Winn, J., & Jojic, N. (2005a). LOCUS: learning object classes with unsupervised segmentation. In ICCV.

    Google Scholar 

  • Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge support from the Swiss National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Deselaers.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deselaers, T., Alexe, B. & Ferrari, V. Weakly Supervised Localization and Learning with Generic Knowledge. Int J Comput Vis 100, 275–293 (2012). https://doi.org/10.1007/s11263-012-0538-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-012-0538-3

Keywords

Navigation