Abstract
Learning a new object class from cluttered training images is very challenging when the location of object instances is unknown, i.e. in a weakly supervised setting. Many previous works require objects covering a large portion of the images. We present a novel approach that can cope with extensive clutter as well as large scale and appearance variations between object instances. To make this possible we exploit generic knowledge learned beforehand from images of other classes for which location annotation is available. Generic knowledge facilitates learning any new class from weakly supervised images, because it reduces the uncertainty in the location of its object instances. We propose a conditional random field that starts from generic knowledge and then progressively adapts to the new class. Our approach simultaneously localizes object instances while learning an appearance model specific for the class. We demonstrate this on several datasets, including the very challenging Pascal VOC 2007. Furthermore, our method allows training any state-of-the-art object detector in a weakly supervised fashion, although it would normally require object location annotations.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Baseline suggested by C. Lampert in personal communication.
Unfortunately, we could not obtain the source code from Chum and Zisserman (2007). We asked them to process our Pascal07-6x2 training sets and they confirmed that their method performs poorly on them.
Derived from the PR plots in their paper (Fig. 5).
The source code is available at http://people.cs.uchicago.edu/~pff/latent/.
References
Alexe, B., Deselaers, T., & Ferrari, V. (2010a). ClassCut for unsupervised class segmentation. In ECCV.
Alexe, B., Deselaers, T., & Ferrari, V. (2010b). What is an object? In CVPR.
Alexe, B., Deselaers, T., & Ferrari, V. (2012). Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. In NIPS.
Arora, H., Loeff, N., Forsyth, D., & Ahuja, N. (2007). Unsupervised segmentation of objects using efficient learning. In CVPR.
Babenko, B., Branson, S., & Belongie, S. (2009). Similarity metrics for categorization: From monolithic to category specific. In ICCV.
Bagon, S., Brostovski, O., Galun, M., & Irani, M. (2010). Detecting and sketching the common. In CVPR.
Bay, H., Ess, A., Tuytelaars, T., & van Gool, L. (2008). SURF: speeded up robust features. In CVIU.
Blaschko, B., Vedaldi, A., & Zisserman, A. (2010). Simultaneous object detection and ranking with weak supervision. In NIPS.
Borenstein, E., & Ullman, S. (2004). Learning to segment. In ECCV.
Cao, L., & Li, F. F. (2007). Spatially coherent latent topic model for concurrent segmentation and classification of objects and scene. In ICCV.
Carreira, J., Li, F., & Sminchisescu, C. (2010). Constrained parametric min cuts for automatic object segmentation. In CVPR.
Chen, Y., Bi, J., & Wang, J. Z. (2006). MILES: multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 1931–1947.
Chum, O., & Zisserman, A. (2007). An exemplar model for learning object classes. In CVPR.
Crandall, D. J., & Huttenlocher, D. (2006). Weakly supervised learning of part-based spatial models for visual object recognition. In ECCV.
Dalal, N., & Triggs, B. (2005). Histogram of Oriented Gradients for human detection. In CVPR.
Deselaers, T., & Ferrari, V. (2010). A conditional random field for multiple-instance learning. In ICML.
Deselaers, T., Alexe, B., & Ferrari, V. (2010). Localizing objects while learning their appearance. In ECCV.
Dorkó, G., & Schmid, C. (2005). Object class recognition using discriminative local features. Tech. Rep. RR-5497, INRIA, Rhone-Alpes.
Endres, I., & Hoiem, D. (2010). Category independent object proposals. In ECCV.
Everingham, M., Van Gool, L., Williams, C. K. I., & Zisserman, A. (2006). The PASCAL Visual Object Classes Challenge 2006 (VOC2006). http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2006/.
Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2007). The PASCAL Visual Object Classes Challenge 2007 Results.
Everingham, M., et al. (2010). The PASCAL Visual Object Classes Challenge 2010 Results.
Fei-Fei, L., Fergus, R., & Perona, P. (2003). A bayesian approach to unsupervised one-shot learning of object categories. In ICCV (pp. 1134–1141).
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In CVPR workshop of generative model based vision.
Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR.
Finley, T., & Joachims, T. (2008). Training structural svms when exact inference is intractable. In ICML.
Fritz, M., & Schiele, B. (2006). Towards unsupervised discovery of visual categories. In DAGM.
Frome, A., Singer, Y., Sha, F., & Malik, J. (2007). Learning globally-consistent local distance functions for shape-based image retrieval and classification. In ICCV.
Gaidon, A., Marszalek, M., & Schmid, C. (2009). Mining visual actions from movies. In BMVC.
Galleguillos, C., Babenko, B., Rabinovich, A., & Belongie, S. (2008). Weakly supervised object localization with stable segmentations. In ECCV.
Grauman, K., & Darrell, T. (2006). Unsupervised learning of categories from sets of partially matching image features. In CVPR.
Kim, G., & Torralba, A. (2009). Unsupervised detection of regions of interest using iterative link analysis. In NIPS.
Kolmogorov, V. (2006a). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583.
Kolmogorov, V. (2006b). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583.
Lampert, C., Nickisch, H., & Harmeling, S. (2009a). Learning to detect unseen object classes by between-class attribute transfer. In CVPR.
Lampert, C. H., Blaschko, M. B., & Hofmann, T. (2009b). Efficient subwindow search: A branch and bound framework for object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2129–2142.
Lando, M., & Edelman, S. (1995). Generalization from a single view in face recognition. (technical report cs-tr 95-02). The Weizmann Institute of Science.
Lee, Y., & Grauman, K. (2009a). Shape discovery from unlabeled image collections. In CVPR.
Lee, Y. J., & Grauman, K. (2009b). Foreground focus: unsupervised learning from partially matching images. International Journal of Computer Vision, 85, 143–166.
Malisiewicz, T., & Efros, A. A. (2008). Recognition by association via learning per-exemplar distances. In CVPR.
Nguyen, M., Torresani, L., de la Torre, F., & Rother, C. (2009). Weakly supervised discriminative localization and classification: a joint learning process. In ICCV.
Nowak, E., & Jurie, F. (2007). Learning visual similarity measures for comparing never seen objects. In CVPR.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Payet, N., & Todorovic, S. (2010). From a set of shapes to object discovery. In ECCV.
Quattoni, A., Collins, M., & Darrell, T. (2008). Transfer learning for image classification with sparse prototype representations. In CVPR.
Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. (2007). Self-taught learning: transfer learning from unlabeled data. In ICML.
Ramanan, D. (2006). Learning to parse images of articulated bodies. In NIPS.
Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I., & Schiele, B. (2010). What helps where—and why? semantic relatedness for knowledge transfer. In CVPR.
Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: interactive foreground extraction using iterated graph cuts. Computer Graphics, 23(3), 309–314.
Russel, B. C., & Torralba, A. (2008). LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173.
Russell, B. C., Efros, A. A., Sivic, J., Freeman, W. T., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In CVPR.
Stark, M., Goesele, M., & Schiele, B. (2009). A shape-based object class model for knowledge transfer. In ICCV.
Szummer, M., Kohli, P., & Hoiem, D. (2008). Learning CRFs using graph cuts. In ECCV.
Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In NIPS.
Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In CVPR.
Tommasi, T., & Caputo, B. (2009). The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In BMVC.
Tommasi, T., Orabona, F., & Caputo, B. (2010). Safety in numbers: learning categories from few examples with multi model knowledge transfer. In CVPR.
Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In ECCV.
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.
Viola, P. A., Platt, J., & Zhang, C. (2005). Multiple instance boosting for object detection. In NIPS.
Weinberger, K. Q., Blitzer, J., & Saul, L. K. (2005). Distance metric learning for large margin nearest neighbor classification. In NIPS.
Winn, J., & Jojic, N. (2005a). LOCUS: learning object classes with unsupervised segmentation. In ICCV.
Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 73(2), 213–238
Acknowledgements
The authors gratefully acknowledge support from the Swiss National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Deselaers, T., Alexe, B. & Ferrari, V. Weakly Supervised Localization and Learning with Generic Knowledge. Int J Comput Vis 100, 275–293 (2012). https://doi.org/10.1007/s11263-012-0538-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-012-0538-3