Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

European Conference on Computer Vision

ECCV 2012: Computer Vision – ECCV 2012. Workshops and Demonstrations pp 198–208Cite as

  1. Home
  2. Computer Vision – ECCV 2012. Workshops and Demonstrations
  3. Conference paper
Weakly Supervised Learning of Object Segmentations from Web-Scale Video

Weakly Supervised Learning of Object Segmentations from Web-Scale Video

  • Glenn Hartmann19,
  • Matthias Grundmann20,
  • Judy Hoffman21,
  • David Tsai20,
  • Vivek Kwatra19,
  • Omid Madani19,
  • Sudheendra Vijayanarasimhan19,
  • Irfan Essa20,
  • James Rehg20 &
  • …
  • Rahul Sukthankar19 
  • Conference paper
  • 4146 Accesses

  • 29 Citations

Part of the Lecture Notes in Computer Science book series (LNIP,volume 7583)

Abstract

We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as “dog”, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classifiers are further refined using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we confirm that our proposed methods can learn good object masks just by watching YouTube.

Keywords

  • Object Segmentation
  • Visual Concept
  • Video Segmentation
  • Multiple Instance Learn
  • Video Stabilization

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Chapter PDF

Download to read the full chapter text

References

  1. Ramanan, D., Forsyth, D., Barnard, K.: Building models of animals from video. PAMI 28 (2006)

    Google Scholar 

  2. Ommer, B., Mader, T., Buhmann, J.: Seeing the objects behind the dots: Recognition in videos from a moving camera. IJCV 83 (2009)

    Google Scholar 

  3. Ali, K., Hasler, D., Fleuret, F.: FlowBoost—Appearance learning from sparsely annotated video. In: CVPR (2011)

    Google Scholar 

  4. Leistner, C., Godec, M., Schulter, S., Saffari, A., Werlberger, M., Bischof, H.: Improving classifiers with unlabeled weakly-related videos. In: CVPR (2011)

    Google Scholar 

  5. Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: CVPR (2012)

    Google Scholar 

  6. Kalal, Z., Matas, J., Mikolajczyk, K.: P-N Learning: Bootstrapping binary classifiers by structural constraints. In: CVPR (2010)

    Google Scholar 

  7. Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: ICCV (2007)

    Google Scholar 

  8. Niebles, J.C., Han, B., Ferencz, A., Fei-Fei, L.: Extracting Moving People from Internet Videos. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 527–540. Springer, Heidelberg (2008)

    CrossRef  Google Scholar 

  9. Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: ICCV (2011)

    Google Scholar 

  10. Xiao, J., Shah, M.: Motion layer extraction in the presence of occlusion using graph cuts. PAMI 27, 1644–1659 (2005)

    CrossRef  Google Scholar 

  11. Brox, T., Malik, J.: Object Segmentation by Long Term Analysis of Point Trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  12. Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: CVPR (2011)

    Google Scholar 

  13. Grundmann, M., Kwatra, V., Essa, I.: Auto-directed video stabilization with robust L1 optimal camera paths. In: CVPR (2011)

    Google Scholar 

  14. Zha, Z.J., Hua, X.S., Mei, T., Wang, J., Qi, G.J., Wang, Z.: Joint multi-label multi-instance learning for image classification. In: CVPR (2008)

    Google Scholar 

  15. Viola, P., Platt, J., Zhang, C.: Multiple instance boosting for object detection. In: NIPS (2005)

    Google Scholar 

  16. Chen, Y., Bi, J., Wang, J.: MILES: Multiple-instance learning via embedded instance selection. PAMI 28, 1931–1947 (2006)

    CrossRef  Google Scholar 

  17. Ren, X., Gu, C.: Figure-ground segmentation improves handled object recognition in egocentric video. In: CVPR (2010)

    Google Scholar 

  18. Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: ICCV (2009)

    Google Scholar 

  19. Liu, D., Hua, G., Chen, T.: A hierarchical visual model for video object summarization. PAMI 32, 2178–2190 (2010)

    CrossRef  Google Scholar 

  20. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A library for large linear classification. JMLR 9, 1871–1874 (2008)

    MATH  Google Scholar 

  21. Duchi, J., Singer, Y.: Boosting with structural sparsity. In: ICML (2009)

    Google Scholar 

  22. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. PAMI 23, 1222–1239 (2001)

    CrossRef  Google Scholar 

  23. Ojala, T., et al.: Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In: ICPR (1994)

    Google Scholar 

  24. Wang, X., Han, T.: An HOG-LBP human detector with partial occlusion handling. In: ICCV (2009)

    Google Scholar 

  25. Chaudhry, R., et al.: Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems. In: CVPR (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Google Research, USA

    Glenn Hartmann, Vivek Kwatra, Omid Madani, Sudheendra Vijayanarasimhan & Rahul Sukthankar

  2. Georgia Institute of Technology, USA

    Matthias Grundmann, David Tsai, Irfan Essa & James Rehg

  3. University of California, Berkeley, USA

    Judy Hoffman

Authors
  1. Glenn Hartmann
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Matthias Grundmann
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Judy Hoffman
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. David Tsai
    View author publications

    You can also search for this author in PubMed Google Scholar

  5. Vivek Kwatra
    View author publications

    You can also search for this author in PubMed Google Scholar

  6. Omid Madani
    View author publications

    You can also search for this author in PubMed Google Scholar

  7. Sudheendra Vijayanarasimhan
    View author publications

    You can also search for this author in PubMed Google Scholar

  8. Irfan Essa
    View author publications

    You can also search for this author in PubMed Google Scholar

  9. James Rehg
    View author publications

    You can also search for this author in PubMed Google Scholar

  10. Rahul Sukthankar
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Dipartimento di Ingegneria Elettrica, Gestionale e Meccanica (DIEGM), Università degli Studi di Udine, Via delle Scienze, 208, 33100, Udine, Italy

    Andrea Fusiello

  2. IIT Istituto Italiano di Tecnologia, Via Morego 30, 16163, Genoa, Italy

    Vittorio Murino

  3. Dipartimento di Ingegneria dell’Informazione, Università degli Studi di Modena e Reggio Emilia, Strada Vignolege, 905, 41125, Modena, Italy

    Rita Cucchiara

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hartmann, G. et al. (2012). Weakly Supervised Learning of Object Segmentations from Web-Scale Video. In: Fusiello, A., Murino, V., Cucchiara, R. (eds) Computer Vision – ECCV 2012. Workshops and Demonstrations. ECCV 2012. Lecture Notes in Computer Science, vol 7583. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33863-2_20

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-33863-2_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33862-5

  • Online ISBN: 978-3-642-33863-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

34.229.63.28

Not affiliated

Springer Nature

© 2023 Springer Nature