Advertisement

Scene Segmentation Using the Wisdom of Crowds

  • Ian Simon
  • Steven M. Seitz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5303)

Abstract

Given a collection of images of a static scene taken by many different people, we identify and segment interesting objects. To solve this problem, we use the distribution of images in the collection along with a new field-of-view cue, which leverages the observation that people tend to take photos that frame an object of interest within the field of view. Hence, image features that appear together in many images are likely to be part of the same object. We evaluate the effectiveness of this cue by comparing the segmentations computed by our method against hand-labeled ones for several different models. We also show how the results of our segmentations can be used to highlight important objects in the scene and label them using noisy user-specified textual tag data. These methods are demonstrated on photos of several popular tourist sites downloaded from the Internet.

Keywords

Static Scene Photo Collection Dirichlet Process Mixture Scene Segmentation Ground Truth Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
  2. 2.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: Exploring photo collections in 3D. In: SIGGRAPH Conference Proceedings, pp. 835–846 (2006)Google Scholar
  3. 3.
    Hays, J., Efros, A.A.: Scene completion using millions of photographs. In: SIGGRAPH Conference Proceedings, vol. 26(3) (2007)Google Scholar
  4. 4.
    Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: Proceedings of the 10th IEEE International Conference on Computer Vision, vol. 2, pp. 1816–1823 (2005)Google Scholar
  5. 5.
    Goesele, M., Snavely, N., Curless, B., Hoppe, H., Seitz, S.M.: Multi-view stereo for community photo collections. In: Proceedings of the 11th IEEE International Conference on Computer Vision, pp. 1–8 (2007)Google Scholar
  6. 6.
    Hays, J., Efros, A.A.: IM2GPS: estimating geographic information from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  7. 7.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  8. 8.
    Surowiecki, J.: The Wisdom of Crowds. Random House, New York (2004)Google Scholar
  9. 9.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 264–271 (2003)Google Scholar
  10. 10.
    Sivic, J., Russell, B., Efros, A., Zisserman, A., Freeman, W.: Discovering objects and their localization in images. In: Proceedings of the 10th IEEE International Conference on Computer Vision, vol. 1, pp. 370–377 (2005)Google Scholar
  11. 11.
    Sudderth, E., Torralba, A., Freeman, W., Willsky, A.: Describing visual scenes using transformed objects and parts. International Journal of Computer Vision 77(1-3), 291–330 (2007)CrossRefGoogle Scholar
  12. 12.
    Campbell, N., Vogiatzis, G., Hernandez, C., Cipolla, R.: Automatic 3D object segmentation in multiple views using volumetric graph-cuts. In: Proceedings of the British Machine Vision Conference, vol. 1, pp. 530–539 (2007)Google Scholar
  13. 13.
    Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1605–1614 (2006)Google Scholar
  14. 14.
    Simon, I., Snavely, N., Seitz, S.M.: Scene summarization for online image collections. In: Proceedings of the 11th IEEE International Conference on Computer Vision, pp. 1–8 (2007)Google Scholar
  15. 15.
    Epshtein, B., Ofek, E., Wexler, Y., Zhang, P.: Hierarchical photo organization using geo-relevance. In: Proceedings of the 15th Annual ACM International Symposium on Advances in Geographic Information Systems, pp. 1–7. ACM, New York (2007)Google Scholar
  16. 16.
    Barnard, K., Duygulu, P., Forsyth, D., De Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3(6), 1107–1135 (2003)zbMATHGoogle Scholar
  17. 17.
    Fischler, M.A., Bolles, R.C.: Random Sample Consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Hofmann, T.: Probabilistic latent semantic analysis. Proceedings of Uncertainty in Artificial Intelligence, 289–296 (1999)Google Scholar
  19. 19.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38Google Scholar
  20. 20.
  21. 21.
    Meilă, M.: Comparing clusterings: an information based distance. Journal of Multivariate Analysis 98(5), 873–895 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Ahern, S., Naaman, M., Nair, R., Yang, J.H.: World Explorer: visualizing aggregate data from unstructured text in geo-referenced collections. In: Proceedings of the 2007 Conference on Digital Libraries, pp. 1–10 (2007)Google Scholar
  23. 23.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3(4-5), 993–1022 (2003)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ian Simon
    • 1
  • Steven M. Seitz
    • 1
  1. 1.University of WashingtonUSA

Personalised recommendations