Skip to main content

Efficient Object Localization and Segmentation in Weakly Labeled Videos

  • Conference paper
Advances in Visual Computing (ISVC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8887))

Included in the following conference series:

Abstract

In this paper, we tackle the problem of efficiently segmenting objects in weakly labeled videos. Internet videos (e.g., YouTube) are often associated with a semantic tag describing the main object within the video. However, this tag does not provide any spatial or temporal information about the object within the video. So these videos are weakly labeled. We propose a novel and efficient approach to localize the object of interest within the video and perform pixel-level segmentation. Given a video with an object tag, our proposed method automatically localizes the object and segments it from the background in each frame of the video. Our method combines object appearance modeling and temporal consistency among frames in a principled framework. Our method does not require user inputs or object detectors, so it can be potentially applied to videos of any object categories. We evaluate our method on a dataset consisting of more than 100 video shots of 10 different object categories. Our experimental results show that our method outperforms other baseline approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brendel, W., Todorovic, S.: Video object segmentation by tracking regions. In: IEEE International Conference on Computer Vision, pp. 833–840 (2009)

    Google Scholar 

  2. Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2141–2148 (2010)

    Google Scholar 

  3. Xu, C., Xiong, C., Corso, J.J.: Streaming hierarchical video segmentation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 626–639. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  4. Vazquez-Reina, A., Avidan, S., Pfister, H., Miller, E.: Multiple hypothesis video segmentation from superpixel flows. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 268–281. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Lezama, J., Alahari, K., Sivic, J., Laptev, I.: Track to the future: Spatio-temporal video segmentation with long-range motion cues. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3369–3376 (2011)

    Google Scholar 

  6. Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Raza, S.H., Grundmann, M., Essa, I.: Geometric context from videos. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3081–3088 (2013)

    Google Scholar 

  8. Badrinarayanan, V., Galasso, F., Cipolla, R.: Label propagation in video sequences. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3265–3272 (2010)

    Google Scholar 

  9. Badrinarayanan, V., Budvytis, I., Cipolla, R.: Semi-supervised video segmentation using tree structured graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 2751–2764 (2013)

    Article  Google Scholar 

  10. Tang, K.D., Sukthankar, R., Yagnik, J., Li, F.F.: Discriminative segment annotation in weakly labeled video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2483–2490 (2013)

    Google Scholar 

  11. Hartmann, G., Grundmann, M., Hoffman, J., Tsai, D., Kwatra, V., Madani, O., Vijayanarasimhan, S., Essa, I., Rehg, J., Sukthankar, R.: Weakly supervised learning of object segmentations from web-scale video. In: ECCV Workshop on Web-scale Vision and Social Media, pp. 198–208 (2012)

    Google Scholar 

  12. Rochan, M., Rahman, S., Bruce, N.D., Wang, Y.: Segmenting objects in weakly labeled videos. In: IEEE Canadian Conference on Computer and Robot Vision, pp. 119–126. IEEE (2014)

    Google Scholar 

  13. Joulin, A., Tang, K., Fei-Fei, L.: Efficient image and video co-localization with frank-wolfe algorithm. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 253–268. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  14. Rother, C., Kolmogorov, V., Blake, A.: Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics (TOG) 23, 309–314 (2004)

    Article  Google Scholar 

  15. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1672–1645 (2010)

    Google Scholar 

  16. Zitnick, C.L., Dollár, P.: Edge boxes: Locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  17. Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3282–3289. IEEE (2012)

    Google Scholar 

  18. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88, 303–338 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Rochan, M., Wang, Y. (2014). Efficient Object Localization and Segmentation in Weakly Labeled Videos. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2014. Lecture Notes in Computer Science, vol 8887. Springer, Cham. https://doi.org/10.1007/978-3-319-14249-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14249-4_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14248-7

  • Online ISBN: 978-3-319-14249-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics