Localizing Objects While Learning Their Appearance

  • Thomas Deselaers
  • Bogdan Alexe
  • Vittorio Ferrari
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6314)


Learning a new object class from cluttered training images is very challenging when the location of object instances is unknown. Previous works generally require objects covering a large portion of the images. We present a novel approach that can cope with extensive clutter as well as large scale and appearance variations between object instances. To make this possible we propose a conditional random field that starts from generic knowledge and then progressively adapts to the new class. Our approach simultaneously localizes object instances while learning an appearance model specific for the class. We demonstrate this on the challenging Pascal VOC 2007 dataset. Furthermore, our method enables to train any state-of-the-art object detector in a weakly supervised fashion, although it would normally require object location annotations.


Training Image Object Class Appearance Model Target Class Learning Stage 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Arora, H., Loeff, N., Forsyth, D., Ahuja, N.: Unsupervised segmentation of objects using efficient learning. In: CVPR (2007)Google Scholar
  2. 2.
    Crandall, D.J., Huttenlocher, D.: Weakly supervised learning of part-based spatial models for visual object recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 16–29. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)Google Scholar
  4. 4.
    Galleguillos, C., Babenko, B., Rabinovich, A., Belongie, S.: Weakly supervised object localization with stable segmentations. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 193–207. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Todorovic, S., Ahuja, N.: Extracting subimages of an unknown category from a set of images. In: CVPR (2006)Google Scholar
  6. 6.
    Winn, J., Jojic, N.: LOCUS: learning object classes with unsupervised segmentation. In: ICCV (2005)Google Scholar
  7. 7.
    Nguyen, M., Torresani, L., de la Torre, F., Rother, C.: Weakly supervised discriminative localization and classification: a joint learning process. In: ICCV (2009)Google Scholar
  8. 8.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. PAMI (2009) (in press)Google Scholar
  9. 9.
    Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 Results (2007)Google Scholar
  10. 10.
    Borenstein, E., Ullman, S.: Learning to segment. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 315–328. Springer, Heidelberg (2004)Google Scholar
  11. 11.
    Russell, B., Efros, A., Sivic, J., Freeman, W., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: CVPR (2006)Google Scholar
  12. 12.
    Chum, O., Zisserman, A.: An exemplar model for learning object classes. In: CVPR (2007)Google Scholar
  13. 13.
    Dorkó, G., Schmid, C.: Object class recognition using discriminative local features. Technical Report RR-5497, INRIA - Rhone-Alpes (2005)Google Scholar
  14. 14.
    Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. In: IJCV (2007)Google Scholar
  15. 15.
    Cao, L., Li, F.F.: Spatially coherent latent topic model for concurrent segmentation and classification of objects and scene. In: ICCV (2007)Google Scholar
  16. 16.
    Lee, Y.J., Grauman, K.: Shape discovery from unlabeled image collections. In: CVPR (2009)Google Scholar
  17. 17.
    Kim, G., Torralba, A.: Unsupervised detection of regions of interest using iterative link analysis. In: NIPS (2009)Google Scholar
  18. 18.
    Russel, B.C., Torralba, A.: LabelMe: a database and web-based tool for image annotation. IJCV 77, 157–173 (2008)CrossRefGoogle Scholar
  19. 19.
    Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.: Self-taught learning: transfer learning from unlabeled data. In: ICML (2007)Google Scholar
  20. 20.
    Thrun, S.: Is learning the n-th thing any easier than learning the first? In: NIPS (1996)Google Scholar
  21. 21.
    Lando, M., Edelman, S.: Generalization from a single view in face recognition. In: Technical Report CS-TR 95-02, The Weizmann Institute of Science (1995)Google Scholar
  22. 22.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In: CVPR Workshop of Generative Model Based Vision (2004)Google Scholar
  23. 23.
    Stark, M., Goesele, M., Schiele, B.: A shape-based object class model for knowledge transfer. In: ICCV (2009)Google Scholar
  24. 24.
    Tommasi, T., Caputo, B.: The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: BMVC (2009)Google Scholar
  25. 25.
    Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: CVPR (2010)Google Scholar
  26. 26.
    Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. In: SIGGRAPH, vol. 23, pp. 309–314 (2004)Google Scholar
  27. 27.
    Ramanan, D.: Learning to parse images of articulated bodies. In: NIPS (2006)Google Scholar
  28. 28.
    Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. PAMI 28, 1568–1583 (2006)Google Scholar
  29. 29.
    Dalal, N., Triggs, B.: Histogram of Oriented Gradients for Human Detection. In: CVPR (2005)Google Scholar
  30. 30.
    Lampert, C.H., Blaschko, M.B., Hofmann, T.: Efficient subwindow search: A branch and bound framework for object localization. PAMI (2009) (in press)Google Scholar
  31. 31.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42, 145–175 (2001)zbMATHCrossRefGoogle Scholar
  32. 32.
    Bay, H., Ess, A., Tuytelaars, T., van Gool, L.: SURF: Speeded up robust features. CVIU 110, 346–359 (2008)Google Scholar
  33. 33.
    Everingham, M., Van Gool, L., Williams, C.K.I., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2006 (VOC2006) (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Thomas Deselaers
    • 1
  • Bogdan Alexe
    • 1
  • Vittorio Ferrari
    • 1
  1. 1.Computer Vision LaboratoryETH ZurichZurichSwitzerland

Personalised recommendations