Abstract
We present a principled framework for inferring pixel labels in weakly annotated image datasets. Most previous, example-based approaches to computer vision rely on a large corpus of densely labeled images. However, for large, modern image datasets, such labels are expensive to obtain and are often unavailable. We establish a large-scale graphical model spanning all labeled and unlabeled images, then solve it to infer pixel labels jointly for all images in the dataset while enforcing consistent annotations over similar visual patterns. This model requires significantly less labeled data and assists in resolving ambiguities by propagating inferred annotations from images with stronger local visual evidences to images with weaker local evidences. We apply our proposed framework to two computer vision problems: image annotation with semantic segmentation, and object discovery and co-segmentation (segmenting multiple images containing a common object). Extensive numerical evaluations and comparisons show that our method consistently outperforms the state of the art in automatic annotation and semantic labeling, while requiring significantly less labeled data. In contrast to previous co-segmentation techniques, our method manages to discover and segment objects well even in the presence of substantial amounts of noise images (images not containing the common object), as typical for datasets collected from Internet search.
This work was done while Michael Rubinstein was a PhD student at MIT, during two summer internships at Microsoft Research, and while Ce Liu was a researcher at Microsoft Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In our experiments, we did not notice significant difference in the results when computing the nearest neighbor set using pyramid matching [28] instead of Gist.
- 2.
Concurrently to releasing our paper, Kuettel et al. [27] managed to improve the state-of-the-art precision on the iCoseg dataset (91. 4 %).
References
Bagon, S., Brostovski, O., Galun, M., Irani, M.: Detecting and sketching the common. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 33–40. IEEE, San Francisco (2010)
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.: Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Grap. 28(3), 24 (2009)
Batra, D., Kowdle, A., Parikh, D., Luo, J., Chen, T.: icoseg: interactive co-segmentation with intelligent scribble guidance. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176 (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cheng, M.M., Zhang, G.X., Mitra, N.J., Huang, X., Hu, S.M.: Global contrast based salient region detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409–416 (2011)
Collins, M.D., Xu, J., Grady, L., Singh, V.: Random walks based multi-image segmentation: Quasiconvexity results and gpu-based solutions. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1656–1663 (2012)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)
Delong, A., Gorelick, L., Schmidt, F.R., Veksler, O., Boykov, Y.: Interactive segmentation with super-labels. In: Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 147–162 (2011)
Faktor, A., Irani, M.: Clustering by composition–unsupervised discovery of image categories. In: European Conference on Computer Vision (ECCV), pp. 474–487 (2012)
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. II–1002 (2004)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. II–264. IEEE, Madison (2003)
Freeman, W.T., Pasztor, E.C., Carmichael, O.T.: Learning low-level vision. Int. J. Comput. Vis. 40(1), 25–47 (2000)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Grubinger, M., Clough, P., Müller, H., Deselaers, T.: The iapr tc-12 benchmark: a new evaluation resource for visual information systems. In: International Conference on Language Resources and Evaluation, pp. 13–23 (2006)
Heath, K., Gelfand, N., Ovsjanikov, M., Aanjaneya, M., Guibas, L.J.: Image webs: computing and exploiting connectivity in image collections. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3432–3439 (2010)
Hochbaum, D.S., Singh, V.: An efficient algorithm for co-segmentation. In: IEEE International Conference on Computer Vision (ICCV), pp. 269–276 (2009)
Jing, Y., Baluja, S.: Visualrank: applying pagerank to large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1877–1890 (2008)
Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 1943–1950 (2010)
Joulin, A., Bach, F., Ponce, J.: Multi-class cosegmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 542–549 (2012)
Karsch, K., Liu, C., Kang, S.B.: Depth extraction from video using non-parametric sampling. In: European Conference on Computer Vision (ECCV), pp. 775–788 (2012)
Kim, G., Torralba, A.: Unsupervised detection of regions of interest using iterative link analysis. In: Advances in Neural Information Processing Systems (NIPS), pp. 961–969 (2009)
Kim, G., Xing, E.P.: On multiple foreground cosegmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 837–844. IEEE, Providence (2012)
Kim, G., Xing, E.P.: Jointly aligning and segmenting multiple web photo streams for the inference of collective photo storylines. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 620–627 (2013)
Kim, G., Xing, E.P., Fei-Fei, L., Kanade, T.: Distributed cosegmentation via submodular optimization on anisotropic diffusion. In: IEEE International Conference on Computer Vision (ICCV), pp. 169–176 (2011)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. arXiv preprint (2012) [arXiv:12105644]
Kuettel, D., Guillaumin, M., Ferrari, V.: Segmentation propagation in imagenet. In: European Conference on Computer Vision (ECCV), pp. 459–473 (2012)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 2169–2178. IEEE, New York (2006)
Liang, L., Liu, C., Xu, Y.Q., Guo, B., Shum, H.Y.: Real-time texture synthesis by patch-based sampling. ACM Trans. Graph. 20(3), 127–150 (2001)
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing via label transfer. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2368–2382 (2011)
Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)
Liu, S., Yan, S., Zhang, T., Xu, C., Liu, J., Lu, H.: Weakly supervised graph propagation towards collective image parsing. IEEE Trans. Multimedia 14(2), 361–373 (2012)
Makadia, A., Pavlovic, V., Kumar, S.: Baselines for image annotation. Int. J. Comput. Vis. 90(1), 88–105 (2010)
Mukherjee, L., Singh, V., Dyer, C.R.: Half-integrality based algorithms for cosegmentation of images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2028–2035. IEEE, Miami Beach (2009)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)
Rother, C., Minka, T., Blake, A., Kolmogorov, V.: Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 993–1000 (2006)
Rubinstein, M., Liu, C., Freeman, W.T.: Annotation propagation in large image databases via dense image correspondence. In: European Conference on Computer Vision (ECCV), pp. 85–99 (2012)
Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1939–1946 (2013)
Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 1605–1614 (2006)
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. Int. J. Comput.Vis. 77(1–3), 157–173 (2008)
Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: European Conference on Computer Vision (ECCV), pp. 1–15 (2006)
Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: IEEE International Conference on Computer Vision (ICCV), vol. 1, pp. 370–377 (2005)
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. ACM Trans. Graph. 25(3), 835–846 (2006)
Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., Rother, C.: A comparative study of energy minimization methods for markov random fields with smoothness-based priors. IEEE Trans. Pattern Anal. Mach. Intell. 30(6), 1068–1080 (2008)
Tappen, M.F., Liu, C.: A bayesian approach to alignment-based image hallucination. In: European Conference on Computer Vision (ECCV), pp. 236–249 (2012)
Tighe, J., Lazebnik, S.: Superparsing: scalable nonparametric image parsing with superpixels. In: European Conference on Computer Vision (ECCV), pp. 352–365 (2010)
Tompkin, J., Kim, K.I., Kautz, J., Theobalt, C.: Videoscapes: exploring sparse, unstructured video collections. ACM Trans. Graph. 31(4):68 (2012)
Vicente, S., Rother, C., Kolmogorov, V.: Object cosegmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2217–2224 (2011)
Vijayanarasimhan, S., Grauman, K.: Cost-sensitive active visual category learning. Int. J. Comput. Vis. 91(1), 24–44 (2011)
Von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: ACM Conference on Human Factors in Computing Systems (Proc. SIGCHI), pp. 319–326 (2004)
Wang, X.J., Zhang, L., Liu, M., Li, Y., Ma, W.Y.: Arista-image search to annotation on billions of web photos. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2987–2994 (2010)
Winn, J., Jojic, N.: Locus: learning object classes with unsupervised segmentation. In: IEEE International Conference on Computer Vision (ICCV), vol. 1, pp. 756–763 (2005)
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3485–3492 (2010)
Zhu, S.C., Wu, Y., Mumford, D.: Filters, random fields and maximum entropy (frame): towards a unified theory for texture modeling. Int. J. Comput. Vis. 27(2), 107–126 (1998)
Zoran, D., Weiss, Y.: Natural images, gaussian mixtures and dead leaves. In: Advances in Neural Information Processing Systems (NIPS), pp. 1736–1744 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Rubinstein, M., Liu, C., Freeman, W.T. (2016). Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence. In: Hassner, T., Liu, C. (eds) Dense Image Correspondences for Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-319-23048-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-23048-1_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23047-4
Online ISBN: 978-3-319-23048-1
eBook Packages: EngineeringEngineering (R0)