Skip to main content

Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

  • Chapter
Dense Image Correspondences for Computer Vision

Abstract

We present a principled framework for inferring pixel labels in weakly annotated image datasets. Most previous, example-based approaches to computer vision rely on a large corpus of densely labeled images. However, for large, modern image datasets, such labels are expensive to obtain and are often unavailable. We establish a large-scale graphical model spanning all labeled and unlabeled images, then solve it to infer pixel labels jointly for all images in the dataset while enforcing consistent annotations over similar visual patterns. This model requires significantly less labeled data and assists in resolving ambiguities by propagating inferred annotations from images with stronger local visual evidences to images with weaker local evidences. We apply our proposed framework to two computer vision problems: image annotation with semantic segmentation, and object discovery and co-segmentation (segmenting multiple images containing a common object). Extensive numerical evaluations and comparisons show that our method consistently outperforms the state of the art in automatic annotation and semantic labeling, while requiring significantly less labeled data. In contrast to previous co-segmentation techniques, our method manages to discover and segment objects well even in the presence of substantial amounts of noise images (images not containing the common object), as typical for datasets collected from Internet search.

This work was done while Michael Rubinstein was a PhD student at MIT, during two summer internships at Microsoft Research, and while Ce Liu was a researcher at Microsoft Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In our experiments, we did not notice significant difference in the results when computing the nearest neighbor set using pyramid matching [28] instead of Gist.

  2. 2.

    Concurrently to releasing our paper, Kuettel et al. [27] managed to improve the state-of-the-art precision on the iCoseg dataset (91. 4 %).

References

  1. Bagon, S., Brostovski, O., Galun, M., Irani, M.: Detecting and sketching the common. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 33–40. IEEE, San Francisco (2010)

    Google Scholar 

  2. Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.: Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Grap. 28(3), 24 (2009)

    Google Scholar 

  3. Batra, D., Kowdle, A., Parikh, D., Luo, J., Chen, T.: icoseg: interactive co-segmentation with intelligent scribble guidance. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176 (2010)

    Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Cheng, M.M., Zhang, G.X., Mitra, N.J., Huang, X., Hu, S.M.: Global contrast based salient region detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409–416 (2011)

    Google Scholar 

  6. Collins, M.D., Xu, J., Grady, L., Singh, V.: Random walks based multi-image segmentation: Quasiconvexity results and gpu-based solutions. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1656–1663 (2012)

    Google Scholar 

  7. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)

    Google Scholar 

  8. Delong, A., Gorelick, L., Schmidt, F.R., Veksler, O., Boykov, Y.: Interactive segmentation with super-labels. In: Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 147–162 (2011)

    Google Scholar 

  9. Faktor, A., Irani, M.: Clustering by composition–unsupervised discovery of image categories. In: European Conference on Computer Vision (ECCV), pp. 474–487 (2012)

    Google Scholar 

  10. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)

    Google Scholar 

  11. Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. II–1002 (2004)

    Google Scholar 

  12. Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. II–264. IEEE, Madison (2003)

    Google Scholar 

  13. Freeman, W.T., Pasztor, E.C., Carmichael, O.T.: Learning low-level vision. Int. J. Comput. Vis. 40(1), 25–47 (2000)

    Article  MATH  Google Scholar 

  14. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)

    Article  MATH  Google Scholar 

  15. Grubinger, M., Clough, P., Müller, H., Deselaers, T.: The iapr tc-12 benchmark: a new evaluation resource for visual information systems. In: International Conference on Language Resources and Evaluation, pp. 13–23 (2006)

    Google Scholar 

  16. Heath, K., Gelfand, N., Ovsjanikov, M., Aanjaneya, M., Guibas, L.J.: Image webs: computing and exploiting connectivity in image collections. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3432–3439 (2010)

    Google Scholar 

  17. Hochbaum, D.S., Singh, V.: An efficient algorithm for co-segmentation. In: IEEE International Conference on Computer Vision (ICCV), pp. 269–276 (2009)

    Google Scholar 

  18. Jing, Y., Baluja, S.: Visualrank: applying pagerank to large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1877–1890 (2008)

    Article  Google Scholar 

  19. Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 1943–1950 (2010)

    Google Scholar 

  20. Joulin, A., Bach, F., Ponce, J.: Multi-class cosegmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 542–549 (2012)

    Google Scholar 

  21. Karsch, K., Liu, C., Kang, S.B.: Depth extraction from video using non-parametric sampling. In: European Conference on Computer Vision (ECCV), pp. 775–788 (2012)

    Google Scholar 

  22. Kim, G., Torralba, A.: Unsupervised detection of regions of interest using iterative link analysis. In: Advances in Neural Information Processing Systems (NIPS), pp. 961–969 (2009)

    Google Scholar 

  23. Kim, G., Xing, E.P.: On multiple foreground cosegmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 837–844. IEEE, Providence (2012)

    Google Scholar 

  24. Kim, G., Xing, E.P.: Jointly aligning and segmenting multiple web photo streams for the inference of collective photo storylines. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 620–627 (2013)

    Google Scholar 

  25. Kim, G., Xing, E.P., Fei-Fei, L., Kanade, T.: Distributed cosegmentation via submodular optimization on anisotropic diffusion. In: IEEE International Conference on Computer Vision (ICCV), pp. 169–176 (2011)

    Google Scholar 

  26. Krähenbühl, P., Koltun, V.: Efficient inference in fully connected crfs with gaussian edge potentials. arXiv preprint (2012) [arXiv:12105644]

    Google Scholar 

  27. Kuettel, D., Guillaumin, M., Ferrari, V.: Segmentation propagation in imagenet. In: European Conference on Computer Vision (ECCV), pp. 459–473 (2012)

    Google Scholar 

  28. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 2169–2178. IEEE, New York (2006)

    Google Scholar 

  29. Liang, L., Liu, C., Xu, Y.Q., Guo, B., Shum, H.Y.: Real-time texture synthesis by patch-based sampling. ACM Trans. Graph. 20(3), 127–150 (2001)

    Article  Google Scholar 

  30. Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing via label transfer. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2368–2382 (2011)

    Article  Google Scholar 

  31. Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)

    Article  Google Scholar 

  32. Liu, S., Yan, S., Zhang, T., Xu, C., Liu, J., Lu, H.: Weakly supervised graph propagation towards collective image parsing. IEEE Trans. Multimedia 14(2), 361–373 (2012)

    Article  Google Scholar 

  33. Makadia, A., Pavlovic, V., Kumar, S.: Baselines for image annotation. Int. J. Comput. Vis. 90(1), 88–105 (2010)

    Article  Google Scholar 

  34. Mukherjee, L., Singh, V., Dyer, C.R.: Half-integrality based algorithms for cosegmentation of images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2028–2035. IEEE, Miami Beach (2009)

    Google Scholar 

  35. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  36. Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)

    Article  Google Scholar 

  37. Rother, C., Minka, T., Blake, A., Kolmogorov, V.: Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 993–1000 (2006)

    Google Scholar 

  38. Rubinstein, M., Liu, C., Freeman, W.T.: Annotation propagation in large image databases via dense image correspondence. In: European Conference on Computer Vision (ECCV), pp. 85–99 (2012)

    Google Scholar 

  39. Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1939–1946 (2013)

    Google Scholar 

  40. Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 1605–1614 (2006)

    Google Scholar 

  41. Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. Int. J. Comput.Vis. 77(1–3), 157–173 (2008)

    Article  Google Scholar 

  42. Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: European Conference on Computer Vision (ECCV), pp. 1–15 (2006)

    Google Scholar 

  43. Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)

    Google Scholar 

  44. Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: IEEE International Conference on Computer Vision (ICCV), vol. 1, pp. 370–377 (2005)

    Google Scholar 

  45. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. ACM Trans. Graph. 25(3), 835–846 (2006)

    Article  Google Scholar 

  46. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., Rother, C.: A comparative study of energy minimization methods for markov random fields with smoothness-based priors. IEEE Trans. Pattern Anal. Mach. Intell. 30(6), 1068–1080 (2008)

    Article  Google Scholar 

  47. Tappen, M.F., Liu, C.: A bayesian approach to alignment-based image hallucination. In: European Conference on Computer Vision (ECCV), pp. 236–249 (2012)

    Google Scholar 

  48. Tighe, J., Lazebnik, S.: Superparsing: scalable nonparametric image parsing with superpixels. In: European Conference on Computer Vision (ECCV), pp. 352–365 (2010)

    Google Scholar 

  49. Tompkin, J., Kim, K.I., Kautz, J., Theobalt, C.: Videoscapes: exploring sparse, unstructured video collections. ACM Trans. Graph. 31(4):68 (2012)

    Article  Google Scholar 

  50. Vicente, S., Rother, C., Kolmogorov, V.: Object cosegmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2217–2224 (2011)

    Google Scholar 

  51. Vijayanarasimhan, S., Grauman, K.: Cost-sensitive active visual category learning. Int. J. Comput. Vis. 91(1), 24–44 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  52. Von Ahn, L., Dabbish, L.: Labeling images with a computer game. In: ACM Conference on Human Factors in Computing Systems (Proc. SIGCHI), pp. 319–326 (2004)

    Google Scholar 

  53. Wang, X.J., Zhang, L., Liu, M., Li, Y., Ma, W.Y.: Arista-image search to annotation on billions of web photos. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2987–2994 (2010)

    Google Scholar 

  54. Winn, J., Jojic, N.: Locus: learning object classes with unsupervised segmentation. In: IEEE International Conference on Computer Vision (ICCV), vol. 1, pp. 756–763 (2005)

    Google Scholar 

  55. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3485–3492 (2010)

    Google Scholar 

  56. Zhu, S.C., Wu, Y., Mumford, D.: Filters, random fields and maximum entropy (frame): towards a unified theory for texture modeling. Int. J. Comput. Vis. 27(2), 107–126 (1998)

    Article  Google Scholar 

  57. Zoran, D., Weiss, Y.: Natural images, gaussian mixtures and dead leaves. In: Advances in Neural Information Processing Systems (NIPS), pp. 1736–1744 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Rubinstein .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Rubinstein, M., Liu, C., Freeman, W.T. (2016). Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence. In: Hassner, T., Liu, C. (eds) Dense Image Correspondences for Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-319-23048-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23048-1_11

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23047-4

  • Online ISBN: 978-3-319-23048-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics