Skip to main content
Log in

ImageNet Auto-Annotation with Segmentation Propagation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

ImageNet is a large-scale hierarchical database of object classes with millions of images.We propose to automatically populate it with pixelwise object-background segmentations, by leveraging existing manual annotations in the form of class labels and bounding-boxes. The key idea is to recursively exploit images segmented so far to guide the segmentation of new images. At each stage this propagation process expands into the images which are easiest to segment at that point in time, e.g. by moving to the semantically most related classes to those segmented so far. The propagation of segmentation occurs both (a) at the image level, by transferring existing segmentations to estimate the probability of a pixel to be foreground, and (b) at the class level, by jointly segmenting images of the same class and by importing the appearance models of classes that are already segmented. Through experiments on 577 classes and 500k images we show that our technique (i) annotates a wide range of classes with accurate segmentations; (ii) effectively exploits the hierarchical structure of ImageNet; (iii) scales efficiently, especially when implemented on superpixels; (iv) outperforms a baseline GrabCut (Rother et al. 2004) initialized on the image center, as well as segmentation transfer from a fixed source pool and run independently on each target image (Kuettel and Ferrari 2012). Moreover, our method also delivers state-of-the-art results on the recent iCoseg dataset for co-segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://www.image-net.org/.

  2. Website: http://www.vision.ee.ethz.ch/~mguillau/imagenet.html?calvin.

  3. The pascal visual object classes challenge. http://pascallin.ecs.soton.ac.uk/challenges/VOC/.

  4. Ibid.

  5. Therefore, the numbers reported in Kuettel et al. (2012) are not directly comparable with the ones in this article.

  6. This differs from the conclusion we reached in our earlier paper (Kuettel et al. 2012). The output segmentations were affected by a bug in our GrabCut implementation, resulting in many erroneous segmentations. These errors were amplified through propagation, leading to the observation that performance decreased with stages. On average over all images, in Kuettel et al. (2012), we reported 77.1 % accuracy. When evaluated using the refined ground-truth, those segmentations yield 80.0 % accuracy and 37.3 % IoU, clearly below the correct result we report in this paper (84.4 % accuracy and 57.3 % IoU).

References

  • The PASCAL Visual Object Classes. http://pascallin.ecs.soton.ac.uk/challenges/VOC/

  • Alexe, B., Deselaers, T., & Ferrari, V. (2010). ClassCut for unsupervised class segmentation. In: Proceedings of the European Conference on Computer Vision.

  • Alexe, B., Deselaers, T., & Ferrari, V. (2010). What is an object? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Arora, H., Loeff, N., Forsyth, D., & Ahuja, N. (2007). Unsupervised segmentation of objects using efficient learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis.

  • Aytar, Y., & Zisserman, A. (2011). Tabula rasa: Model transfer for object category detection. In: Proceedings of the International Conference on Computer Vision.

  • Aytar, Y., & Zisserman, A. (2012). Enhancing exemplar svms using part level transfer regularization. In: Proceedings of the British Machine Vision Conference.

  • Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). iCoseg: Interactive co-segmentation with intelligent scribble guidance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3169–3176).

  • Batra, D., Kowdle, A., Parikh, D., Luo, J., Chen, T. (2011). Interactively co-segmentating topically related images with intelligent scribble guidance. International Journal of Computer Vision, 93(3) 273–292

    Google Scholar 

  • Bertelli, L., Yu, T., Vu, D., & Gokturk, S. (2011). Kernelized structural SVM learning for supervised object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Blake, A., Rother, C., Brown, M., Perez, P., & Torr, P. (2004). Interactive image segmentation using an adaptive GMMRF model. In: Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic.

  • Borenstein, E., Sharon, E., & Ullman, S. (2004). Combining top-down and bottom-up segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC.

  • Boykov, Y., & Jolly, M.P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: Proceedings of the 8th International Conference on Computer Vision, Vancouver, Canada.

  • Carreira, J., & Sminchisescu, C. (2010). Constrained parametric min cuts for automatic object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Chai, Y., Lempitsky, V., & Zisserman, A. (2011).Bicos: A bi-level co-segmentation method for image classification. In: Proceedings of the International Conference on Computer Vision (pp. 2579–2586).

  • Chai, Y., Rahtu, E., Lempitsky, V., Gool, L.V., & Zisserman, A. (2012). Tricos: A tri-level class-discriminative co-segmentation method for image classification. In: Proceedings of the European Conference on Computer Vision

  • Dalal, N., & Triggs, B. (2005). Histogram of oriented gradients for human detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2, 886–893.

    Google Scholar 

  • Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., & Fei-Fei, L. ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012). http://www.image-net.org/challenges/LSVRC/2012/

  • Deng, J., Berg, A.C., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In: Proceedings of the European Conference on Computer Vision.

  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-fei, L. (2009). ImageNet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Deng, J., Krause, J., Berg, A., & Fei-Fei, L. (2012). Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Deng, J., Satheesh, S., Berg, A., & Fei-Fei, L. (2011). Fast and balanced: Efficient label tree learning for large scale object recognition. In: Advances in Neural Information Processing Systems.

  • Deselaers, T., Alexe, B., & Ferrari, V. (2010). Localizing objects while learning their appearance. In: Proceedings of the European Conference on Computer Vision.

  • Deselaers, T., & Ferrari, V. (2011). Visual and semantic similarity in imagenet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Endres, I. & Hoiem, D. (2010). Category independent object proposals. In: Proceedings of the European Conference on Computer Vision.

  • Fei-Fei, L., Fergus, R. & Perona, P. (2004). Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: CVPR Workshop of Generative Model Based Vision.

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.

    Article  Google Scholar 

  • Gong, Y. & Lazebnik, S. (2011). Iterative quantization: A procrustean approach to learning binary codes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Guillaumin, M. & Ferrari, V. (2012). Large-scale knowledge transfer for object localization in imagenet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Guillaumin, M., Mensink, T., Verbeek, J. & Schmid, C. (2009). TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of the International Conference on Computer Vision.

  • Hays, J., & Efros, A. (2007). Scene completion using millions of photographs. In: Proceedings of the ACM SIGGRAPH Conference on Computer Graphics.

  • Jiang, H. (2009). Human pose estimation using consistent max-covering. In: roceedings of the International Conference on Computer Vision.

  • Jojic, N., Perina, A., Cristani, M., Murino, V. & Frey, B. (2009). Stel component analysis: Modeling spatial correlations in image class structure. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Joulin, A., Bach, F. & Ponce, J. (2010). Discriminative clustering for image co-segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1943–1950).

  • Kim, G., Xing, E., Fei-Fei, L. & Kanade, T. (2011). Distributed cosegmentation via submodular optimization on anisotropic diffusion. In: Proceedings of the International Conference on Computer Vision (pp. 169–176).

  • Krizhevsky, A., Sutskever, I. & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems.

  • Kuettel, D., & Ferrari, V. (2012). Figure-ground segmentation by transferring window masks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Kuettel, D., Guillaumin, M. & Ferrari, V. (2012). Segmentation Propagation in ImageNet. In: Proceedings of the European Conference on Computer Vision.

  • Ladicky, L., Russell, C. & Kohli, P. (2009). Associative hierarchical crfs for object class image segmentation. In: Proceedings of the International Conference on Computer Vision.

  • Lampert, C., Nickisch, H. & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Li, f., Carreira, J., & Sminchisescu, C. (2010). Object recogntion as ranking holistic figure-ground hypotheses. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., Cao L., & Huang, C. (2011). Large-scale image classification: fast feature extraction and SVM training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Liu, C., Yuen, J., & Torralba, A. (2009). Nonparametric scene parsing: Label transfer via dense scene alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Malisiewicz, T., Gupta, A., & Efros, A.A. (2011). Ensemble of exemplar-svms for object detection and beyond. In: Proceedings of the International Conference on Computer Vision.

  • Mukherjee, L., Singh, V., Xu, J., & Collins, M.D. (2012). Analyzing the subspace structure of related images: Concurrent segmentation of image sets. In: Proceedings of the European Conference on Computer Vision.

  • Norouzi, M., Punjani, A., & Fleet, D.J. (2012). Fast search in hamming space with multi-index hashing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.

    Article  MATH  Google Scholar 

  • Ott, P., & Everingham, M. (2011). Shared parts for deformable part-based models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Quattoni, A., Collins, M., & Darrell, T. (2008). Transfer learning for image classification with sparse prototype representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Rohrbach, M., Stark, M., Szarvas, G., Gurevych, I., & Schiele, B. (2010). What helps where and why? Semantic relatedness for knowledge transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Rosenfeld, A., & Weinshall, D. (2011). Extracting foreground masks towards object recognition. In: Proceedings of the International Conference on Computer Vision.

  • Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut: Interactive foreground extraction using iterated graph cuts. In: Proceedings of the ACM SIGGRAPH Conference on Computer Graphics.

  • Rother, C., Kolmogorov, V., Minka, T., & Blake, A. (2006). Cosegmentation of image pairs by histogram matching - incorporating a global constraint into MRFs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Russel, B., Torralba, A., Liu, C., & Fergus, R. (2007). Object recognition by scene alignment. In: Advances in Neural Information Processing Systems.

  • Salakhutdinov, R., Torralba, A., & Tenenbaum, J. (2011). Learning to share visual appearance for multiclass object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Van de Sande, K., J.R.R., U., Gevers, T., & Smeulders, A. (2011). Segmentation as selective search for object recognition. In: Proceedings of the International Conference on Computer Vision.

  • Schoenemann, T., & Cremers, D. (2007). Introducing curvature into globally optimal image segmentation: Minimum ratio cycles on product graphs. In: Proceedings of the 11th International Conference on Computer Vision. Rio de Janeiro, Brazil.

  • Shotton, J., Blake, A., & Cipolla, R. (2005). Contour-based learning for object detection. In: Proceedings of the International Conference on Computer Vision.

  • Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Proceedings of the European Conference on Computer Vision.

  • Stark, M., Goesele, M., & Schiele, B. (2009). A shape-based object class model for knowledge transfer. In: Proceedings of the International Conference on Computer Vision.

  • Szummer, M., Kohli, P., & Hoiem, D. (2008). Learning CRFs using graph cuts. In: Proceedings of the European Conference on Computer Vision.

  • Tighe, J., & Lazebnik, S. (2010). Superparsing: Scalable nonparametric image parsing with superpixels. In: Proceedings of the European Conference on Computer Vision.

  • Tommasi, T., Orabona, F., & Caputo, B. (2010). Safety in numbers: Learning categories from few examples with multi model knowledge transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Torralba, A., Fergus, R., & Weiss, Y. (2008). Small codes and large image databases for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.

    MATH  MathSciNet  Google Scholar 

  • Tu, Z., Chen, X., Yuille, A., & Zhu, S. (2005). Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63(2), 113–140.

    Article  Google Scholar 

  • Veksler, O., Boykov, Y., & Mehrani, P. (2010). Superpixels and supervoxels in an energy optimization framework. In: Proceedings of the European Conference on Computer Vision (pp. 211–224).

  • Verbeek, J., Nunnink, J., & Vlassis, N. (2006). Accelerated EM-based clustering of large data sets. Data Mining and Knowledge Discovery, 13(3), 291–307.

    Article  MathSciNet  Google Scholar 

  • Verbeek, J., & Triggs, B. (2007). Region classification with Markov field aspect models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

  • Vicente, S., Kolmogorov, V., & Rother, C. (2008). Graph cut based image segmentation with connectivity priors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, Alaska.

  • Vicente, S., Rother, C., & Kolmogorov, V. (2011). Object cosegmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2217–2224).

  • Wang, J., & Cohen, M. (2005). An iterative optimization approach for unified image segmentation and matting. In: Proceedings of the 10th International Conference on Computer Vision. Beijing, China.

  • Winn, J., & Jojic, N. (2005). LOCUS: Learning object classes with unsupervised segmentation. In: Proceedings of the International Conference on Computer Vision.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthieu Guillaumin.

Additional information

Communicated by Carlo Colombo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guillaumin, M., Küttel, D. & Ferrari, V. ImageNet Auto-Annotation with Segmentation Propagation. Int J Comput Vis 110, 328–348 (2014). https://doi.org/10.1007/s11263-014-0713-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0713-9

Keywords

Navigation