Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework

Taylor, Brian; Ayvaci, Alper; Ravichandran, Avinash; Soatto, Stefano

doi:10.1007/978-3-642-40395-8_15

Brian Taylor¹⁸,
Alper Ayvaci¹⁹,
Avinash Ravichandran¹⁸ &
…
Stefano Soatto¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8081))

Included in the following conference series:

International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition

1584 Accesses
8 Citations

Abstract

We describe an approach to incorporate scene topology and semantics into pixel-level object detection and localization. Our method requires video to determine occlusion regions and thence local depth ordering, and any visual recognition scheme that provides a score at local image regions, for instance object detection probabilities. We set up a cost functional that incorporates occlusion cues induced by object boundaries, label consistency and recognition priors, and solve it using a convex optimization scheme. We show that our method improves localization accuracy of existing recognition approaches, or equivalently provides semantic labels to pixel-level localization and segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)
Chapter Google Scholar
Kohli, P., Ladicky, L., Torr, P.H.S.: Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision (2009)
Google Scholar
Gould, S., Gao, T., Koller, D.: Region-based segmentation and object detection (2009)
Google Scholar
Ladický, L., Russell, C., Kohli, P., Torr, P.H.S.: Associative hierarchical CRFs for object class image segmentation. In: Proc. of the International Conference on Computer Vision (2009)
Google Scholar
Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.H.S.: Joint optimization for object class segmentation and dense stereo reconstruction. International Journal of Computer Vision (2011)
Google Scholar
Floros, G., Leibe, B.: Joint 2d-3d temporally consistent segmentation of street scenes. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (2012)
Google Scholar
Sturgess, P., Alahari, K., Ladický, L., Torr, P.H.S.: Combining appearance and structure from motion features for road scene understanding. In: British Machine Vision Conference (2009)
Google Scholar
Tighe, J., Lazebnik, S.: Superparsing: Scalable nonparametric image parsing with superpixels. International Journal of Computer Vision (2012)
Google Scholar
Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: Proc. of the International Conference on Computer Vision (2009)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (2006)
Google Scholar
Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, where and how many? combining object detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)
Chapter Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (2010)
Google Scholar
Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: Proc. of the International Conference on Computer Vision (2009)
Google Scholar
Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 733–747. Springer, Heidelberg (2008)
Chapter Google Scholar
Ess, A., Mueller, T., Grabner, H., van Gool, L.: Segmentation-based urban traffic scene understanding. In: British Machine Vision Conference (2009)
Google Scholar
Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)
Chapter Google Scholar
Ochs, P., Brox, T.: Object segmentation in video: a hierarchical variational approach for turning point trajectories into dense regions. In: Proc. of the International Conference on Computer Vision (2011)
Google Scholar
Lee, Y.J., Kim, J., Grauman, K.: Key-segments for video object segmentation. In: Proc. of the International Conference on Computer Vision (2011)
Google Scholar
Ayvaci, A., Raptis, M., Soatto, S.: Occlusion detection and motion estimation with convex optimization. In: Advances in Neural Information Processing Systems (2010)
Google Scholar
Ayvaci, A., Soatto, S.: Detachable object detection with efficient model selection. In: Boykov, Y., Kahl, F., Lempitsky, V., Schmidt, F.R. (eds.) EMMCVPR 2011. LNCS, vol. 6819, pp. 191–204. Springer, Heidelberg (2011)
Google Scholar
Martin, D., Fowlkes, C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence (2004)
Google Scholar
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)
Chapter Google Scholar
Tighe, J., Lazebnik, S.: Understanding scenes on many levels. In: Proc. of the International Conference on Computer Vision (2011)
Google Scholar
Opelt, A., Pinz, A.: Object localization with boosting and weak supervision for generic object recognition. In: Proc. of the Scandinavian Conference on Image Analysis (2005)
Google Scholar
Fulkerson, B., Vedaldi, A., Soatto, S.: Class segmentation and object localization with superpixel neighborhoods. In: Proc. of the International Conference on Computer Vision (2009)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge (2011)
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (2004)
Google Scholar
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence (2004)
Google Scholar
Yang, Y., Hallman, S., Ramanan, D., Fowlkes, C.C.: Layered object models for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2011)
Google Scholar
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 1.21 (2011)
Google Scholar
Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V.D., Boyd, S.P., Kimura, H. (eds.) Recent Advances in Learning and Control. LNCIS, vol. 371, pp. 95–110. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Los Angeles, USA
Brian Taylor, Avinash Ravichandran & Stefano Soatto
Honda Research Institute, USA
Alper Ayvaci

Authors

Brian Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Alper Ayvaci
View author publications
You can also search for this author in PubMed Google Scholar
Avinash Ravichandran
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Soatto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Mathematical Sciences, Lund University, Sweden
Anders Heyden , Fredrik Kahl , Carl Olsson & Magnus Oskarsson , , &
Dept. of Mathematics, University of Bergen, Johaness Brunsgate 12, 5007, Bergen, Norway
Xue-Cheng Tai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Taylor, B., Ayvaci, A., Ravichandran, A., Soatto, S. (2013). Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework. In: Heyden, A., Kahl, F., Olsson, C., Oskarsson, M., Tai, XC. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2013. Lecture Notes in Computer Science, vol 8081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40395-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-40395-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40394-1
Online ISBN: 978-3-642-40395-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics