Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

Rubinstein, Michael; Liu, Ce; Freeman, William T.

doi:10.1007/978-3-319-23048-1_11

Michael Rubinstein³,
Ce Liu³ &
William T. Freeman^4,5

2049 Accesses

Abstract

We present a principled framework for inferring pixel labels in weakly annotated image datasets. Most previous, example-based approaches to computer vision rely on a large corpus of densely labeled images. However, for large, modern image datasets, such labels are expensive to obtain and are often unavailable. We establish a large-scale graphical model spanning all labeled and unlabeled images, then solve it to infer pixel labels jointly for all images in the dataset while enforcing consistent annotations over similar visual patterns. This model requires significantly less labeled data and assists in resolving ambiguities by propagating inferred annotations from images with stronger local visual evidences to images with weaker local evidences. We apply our proposed framework to two computer vision problems: image annotation with semantic segmentation, and object discovery and co-segmentation (segmenting multiple images containing a common object). Extensive numerical evaluations and comparisons show that our method consistently outperforms the state of the art in automatic annotation and semantic labeling, while requiring significantly less labeled data. In contrast to previous co-segmentation techniques, our method manages to discover and segment objects well even in the presence of substantial amounts of noise images (images not containing the common object), as typical for datasets collected from Internet search.

This work was done while Michael Rubinstein was a PhD student at MIT, during two summer internships at Microsoft Research, and while Ce Liu was a researcher at Microsoft Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In our experiments, we did not notice significant difference in the results when computing the nearest neighbor set using pyramid matching [28] instead of Gist.
2.
Concurrently to releasing our paper, Kuettel et al. [27] managed to improve the state-of-the-art precision on the iCoseg dataset (91. 4 %).

References

Bagon, S., Brostovski, O., Galun, M., Irani, M.: Detecting and sketching the common. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 33–40. IEEE, San Francisco (2010)
Google Scholar
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.: Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Grap. 28(3), 24 (2009)
Google Scholar
Batra, D., Kowdle, A., Parikh, D., Luo, J., Chen, T.: icoseg: interactive co-segmentation with intelligent scribble guidance. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176 (2010)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cheng, M.M., Zhang, G.X., Mitra, N.J., Huang, X., Hu, S.M.: Global contrast based salient region detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409–416 (2011)
Google Scholar
Collins, M.D., Xu, J., Grady, L., Singh, V.: Random walks based multi-image segmentation: Quasiconvexity results and gpu-based solutions. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1656–1663 (2012)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Google Scholar
Delong, A., Gorelick, L., Schmidt, F.R., Veksler, O., Boykov, Y.: Interactive segmentation with super-labels. In: Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 147–162 (2011)
Google Scholar
Faktor, A., Irani, M.: Clustering by composition–unsupervised discovery of image categories. In: European Conference on Computer Vision (ECCV), pp. 474–487 (2012)
Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Google Scholar
Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. II–1002 (2004)
Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. II–264. IEEE, Madison (2003)
Google Scholar
Freeman, W.T., Pasztor, E.C., Carmichael, O.T.: Learning low-level vision. Int. J. Comput. Vis. 40(1), 25–47 (2000)
Article MATH Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Article MATH Google Scholar
Grubinger, M., Clough, P., Müller, H., Deselaers, T.: The iapr tc-12 benchmark: a new evaluation resource for visual information systems. In: International Conference on Language Resources and Evaluation, pp. 13–23 (2006)
Google Scholar
Heath, K., Gelfand, N., Ovsjanikov, M., Aanjaneya, M., Guibas, L.J.: Image webs: computing and exploiting connectivity in image collections. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3432–3439 (2010)
Google Scholar
Hochbaum, D.S., Singh, V.: An efficient algorithm for co-segmentation. In: IEEE International Conference on Computer Vision (ICCV), pp. 269–276 (2009)
Google Scholar
Jing, Y., Baluja, S.: Visualrank: applying pagerank to large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1877–1890 (2008)
Article Google Scholar
Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 1943–1950 (2010)
Google Scholar
Joulin, A., Bach, F., Ponce, J.: Multi-class cosegmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 542–549 (2012)
Google Scholar
Karsch, K., Liu, C., Kang, S.B.: Depth extraction from video using non-parametric sampling. In: European Conference on Computer Vision (ECCV), pp. 775–788 (2012)
Google Scholar
Kim, G., Torralba, A.: Unsupervised detection of regions of interest using iterative link analysis. In: Advances in Neural Information Processing Systems (NIPS), pp. 961–969 (2009)
Google Scholar
Kim, G., Xing, E.P.: On multiple foreground cosegmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 837–844. IEEE, Providence (2012)
Google Scholar
Kim, G., Xing, E.P.: Jointly aligning and segmenting multiple web photo streams for the inference of collective photo storylines. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 620–627 (2013)
Google Scholar
Kim, G., Xing, E.P., Fei-Fei, L., Kanade, T.: Distributed cosegmentation via submodular optimization on anisotropic diffusion. In: IEEE International Conference on Computer Vision (ICCV), pp. 169–176 (2011)
Google Scholar
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. arXiv preprint (2012) [arXiv:12105644]
Google Scholar
Kuettel, D., Guillaumin, M., Ferrari, V.: Segmentation propagation in imagenet. In: European Conference on Computer Vision (ECCV), pp. 459–473 (2012)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 2169–2178. IEEE, New York (2006)
Google Scholar
Liang, L., Liu, C., Xu, Y.Q., Guo, B., Shum, H.Y.: Real-time texture synthesis by patch-based sampling. ACM Trans. Graph. 20(3), 127–150 (2001)
Article Google Scholar
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing via label transfer. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2368–2382 (2011)
Article Google Scholar
Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)
Article Google Scholar
Liu, S., Yan, S., Zhang, T., Xu, C., Liu, J., Lu, H.: Weakly supervised graph propagation towards collective image parsing. IEEE Trans. Multimedia 14(2), 361–373 (2012)
Article Google Scholar
Makadia, A., Pavlovic, V., Kumar, S.: Baselines for image annotation. Int. J. Comput. Vis. 90(1), 88–105 (2010)
Article Google Scholar
Mukherjee, L., Singh, V., Dyer, C.R.: Half-integrality based algorithms for cosegmentation of images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2028–2035. IEEE, Miami Beach (2009)
Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article MATH Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)
Article Google Scholar
Rother, C., Minka, T., Blake, A., Kolmogorov, V.: Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 993–1000 (2006)
Google Scholar
Rubinstein, M., Liu, C., Freeman, W.T.: Annotation propagation in large image databases via dense image correspondence. In: European Conference on Computer Vision (ECCV), pp. 85–99 (2012)
Google Scholar
Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1939–1946 (2013)
Google Scholar
Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 1605–1614 (2006)
Google Scholar
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. Int. J. Comput.Vis. 77(1–3), 157–173 (2008)
Article Google Scholar
Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: European Conference on Computer Vision (ECCV), pp. 1–15 (2006)
Google Scholar
Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Google Scholar
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: IEEE International Conference on Computer Vision (ICCV), vol. 1, pp. 370–377 (2005)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. ACM Trans. Graph. 25(3), 835–846 (2006)
Article Google Scholar
Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., Rother, C.: A comparative study of energy minimization methods for markov random fields with smoothness-based priors. IEEE Trans. Pattern Anal. Mach. Intell. 30(6), 1068–1080 (2008)
Article Google Scholar
Tappen, M.F., Liu, C.: A bayesian approach to alignment-based image hallucination. In: European Conference on Computer Vision (ECCV), pp. 236–249 (2012)
Google Scholar
Tighe, J., Lazebnik, S.: Superparsing: scalable nonparametric image parsing with superpixels. In: European Conference on Computer Vision (ECCV), pp. 352–365 (2010)
Google Scholar
Tompkin, J., Kim, K.I., Kautz, J., Theobalt, C.: Videoscapes: exploring sparse, unstructured video collections. ACM Trans. Graph. 31(4):68 (2012)
Article Google Scholar
Vicente, S., Rother, C., Kolmogorov, V.: Object cosegmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2217–2224 (2011)
Google Scholar
Vijayanarasimhan, S., Grauman, K.: Cost-sensitive active visual category learning. Int. J. Comput. Vis. 91(1), 24–44 (2011)
Article MathSciNet MATH Google Scholar
Von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: ACM Conference on Human Factors in Computing Systems (Proc. SIGCHI), pp. 319–326 (2004)
Google Scholar
Wang, X.J., Zhang, L., Liu, M., Li, Y., Ma, W.Y.: Arista-image search to annotation on billions of web photos. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2987–2994 (2010)
Google Scholar
Winn, J., Jojic, N.: Locus: learning object classes with unsupervised segmentation. In: IEEE International Conference on Computer Vision (ICCV), vol. 1, pp. 756–763 (2005)
Google Scholar
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3485–3492 (2010)
Google Scholar
Zhu, S.C., Wu, Y., Mumford, D.: Filters, random fields and maximum entropy (frame): towards a unified theory for texture modeling. Int. J. Comput. Vis. 27(2), 107–126 (1998)
Article Google Scholar
Zoran, D., Weiss, Y.: Natural images, gaussian mixtures and dead leaves. In: Advances in Neural Information Processing Systems (NIPS), pp. 1736–1744 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Google Research, Cambridge, MA, USA
Michael Rubinstein & Ce Liu
Google Research, Cambridge, MA, USA
William T. Freeman
MIT CSAIL, Cambridge, MA, USA
William T. Freeman

Authors

Michael Rubinstein
View author publications
You can also search for this author in PubMed Google Scholar
Ce Liu
View author publications
You can also search for this author in PubMed Google Scholar
William T. Freeman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Rubinstein .

Editor information

Editors and Affiliations

The Open University of Israel, Raanana, Israel
Tal Hassner
Google Research, Cambridge, Massachusetts, USA
Ce Liu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rubinstein, M., Liu, C., Freeman, W.T. (2016). Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence. In: Hassner, T., Liu, C. (eds) Dense Image Correspondences for Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-319-23048-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-23048-1_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23047-4
Online ISBN: 978-3-319-23048-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics