Efficient Object Localization and Segmentation in Weakly Labeled Videos

Rochan, Mrigank; Wang, Yang

doi:10.1007/978-3-319-14249-4_17

Mrigank Rochan²⁷ &
Yang Wang²⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8887))

Included in the following conference series:

International Symposium on Visual Computing

3637 Accesses
7 Citations

Abstract

In this paper, we tackle the problem of efficiently segmenting objects in weakly labeled videos. Internet videos (e.g., YouTube) are often associated with a semantic tag describing the main object within the video. However, this tag does not provide any spatial or temporal information about the object within the video. So these videos are weakly labeled. We propose a novel and efficient approach to localize the object of interest within the video and perform pixel-level segmentation. Given a video with an object tag, our proposed method automatically localizes the object and segments it from the background in each frame of the video. Our method combines object appearance modeling and temporal consistency among frames in a principled framework. Our method does not require user inputs or object detectors, so it can be potentially applied to videos of any object categories. We evaluate our method on a dataset consisting of more than 100 video shots of 10 different object categories. Our experimental results show that our method outperforms other baseline approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brendel, W., Todorovic, S.: Video object segmentation by tracking regions. In: IEEE International Conference on Computer Vision, pp. 833–840 (2009)
Google Scholar
Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2141–2148 (2010)
Google Scholar
Xu, C., Xiong, C., Corso, J.J.: Streaming hierarchical video segmentation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 626–639. Springer, Heidelberg (2012)
Chapter Google Scholar
Vazquez-Reina, A., Avidan, S., Pfister, H., Miller, E.: Multiple hypothesis video segmentation from superpixel flows. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 268–281. Springer, Heidelberg (2010)
Chapter Google Scholar
Lezama, J., Alahari, K., Sivic, J., Laptev, I.: Track to the future: Spatio-temporal video segmentation with long-range motion cues. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3369–3376 (2011)
Google Scholar
Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)
Chapter Google Scholar
Raza, S.H., Grundmann, M., Essa, I.: Geometric context from videos. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3081–3088 (2013)
Google Scholar
Badrinarayanan, V., Galasso, F., Cipolla, R.: Label propagation in video sequences. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3265–3272 (2010)
Google Scholar
Badrinarayanan, V., Budvytis, I., Cipolla, R.: Semi-supervised video segmentation using tree structured graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 2751–2764 (2013)
Article Google Scholar
Tang, K.D., Sukthankar, R., Yagnik, J., Li, F.F.: Discriminative segment annotation in weakly labeled video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2483–2490 (2013)
Google Scholar
Hartmann, G., Grundmann, M., Hoffman, J., Tsai, D., Kwatra, V., Madani, O., Vijayanarasimhan, S., Essa, I., Rehg, J., Sukthankar, R.: Weakly supervised learning of object segmentations from web-scale video. In: ECCV Workshop on Web-scale Vision and Social Media, pp. 198–208 (2012)
Google Scholar
Rochan, M., Rahman, S., Bruce, N.D., Wang, Y.: Segmenting objects in weakly labeled videos. In: IEEE Canadian Conference on Computer and Robot Vision, pp. 119–126. IEEE (2014)
Google Scholar
Joulin, A., Tang, K., Fei-Fei, L.: Efficient image and video co-localization with frank-wolfe algorithm. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 253–268. Springer, Heidelberg (2014)
Chapter Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics (TOG) 23, 309–314 (2004)
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1672–1645 (2010)
Google Scholar
Zitnick, C.L., Dollár, P.: Edge boxes: Locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 391–405. Springer, Heidelberg (2014)
Chapter Google Scholar
Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3282–3289. IEEE (2012)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. International Journal of Computer Vision 88, 303–338 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Manitoba, Canada
Mrigank Rochan & Yang Wang

Authors

Mrigank Rochan
View author publications
You can also search for this author in PubMed Google Scholar
Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Nevada at Reno, USA
George Bebis
NASA Ames Research Center, Moffett Field, CA, USA
Richard Boyle
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Bahram Parvin
Desert Research Institute, Reno, NV, USA
Darko Koracin
The University of Texas at Dallas, 75080, Richardson, TX, USA
Ryan McMahan
NextGen Interactions, 27604, Raleigh, NC, USA
Jason Jerald
Indiana University, 46202, Indianapolis, IN, USA
Hui Zhang
Microsoft Research, 1 Microsoft Way, 98052, Redmond, WA, USA
Steven M. Drucker
University of Delaware, 19716-2712, Newark, DE, USA
Chandra Kambhamettu
Intel Corp., 95054, Santa Clara, CA, USA
Maha El Choubassi
Computer Graphics and Interactive Media Lab, Department of Computer Science, University of Houston, 77004, Houston, TX, USA
Zhigang Deng
NVIDIA, 34788, Leesburg, FL, USA
Mark Carlson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rochan, M., Wang, Y. (2014). Efficient Object Localization and Segmentation in Weakly Labeled Videos. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2014. Lecture Notes in Computer Science, vol 8887. Springer, Cham. https://doi.org/10.1007/978-3-319-14249-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-14249-4_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14248-7
Online ISBN: 978-3-319-14249-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics