Skip to main content

Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework

  • Conference paper
Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR 2013)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8081))

Abstract

We describe an approach to incorporate scene topology and semantics into pixel-level object detection and localization. Our method requires video to determine occlusion regions and thence local depth ordering, and any visual recognition scheme that provides a score at local image regions, for instance object detection probabilities. We set up a cost functional that incorporates occlusion cues induced by object boundaries, label consistency and recognition priors, and solve it using a convex optimization scheme. We show that our method improves localization accuracy of existing recognition approaches, or equivalently provides semantic labels to pixel-level localization and segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Kohli, P., Ladicky, L., Torr, P.H.S.: Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision (2009)

    Google Scholar 

  3. Gould, S., Gao, T., Koller, D.: Region-based segmentation and object detection (2009)

    Google Scholar 

  4. Ladický, L., Russell, C., Kohli, P., Torr, P.H.S.: Associative hierarchical CRFs for object class image segmentation. In: Proc. of the International Conference on Computer Vision (2009)

    Google Scholar 

  5. Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.H.S.: Joint optimization for object class segmentation and dense stereo reconstruction. International Journal of Computer Vision (2011)

    Google Scholar 

  6. Floros, G., Leibe, B.: Joint 2d-3d temporally consistent segmentation of street scenes. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (2012)

    Google Scholar 

  7. Sturgess, P., Alahari, K., Ladický, L., Torr, P.H.S.: Combining appearance and structure from motion features for road scene understanding. In: British Machine Vision Conference (2009)

    Google Scholar 

  8. Tighe, J., Lazebnik, S.: Superparsing: Scalable nonparametric image parsing with superpixels. International Journal of Computer Vision (2012)

    Google Scholar 

  9. Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: Proc. of the International Conference on Computer Vision (2009)

    Google Scholar 

  10. Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (2006)

    Google Scholar 

  11. Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, where and how many? combining object detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (2010)

    Google Scholar 

  13. Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: Proc. of the International Conference on Computer Vision (2009)

    Google Scholar 

  14. Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 733–747. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Ess, A., Mueller, T., Grabner, H., van Gool, L.: Segmentation-based urban traffic scene understanding. In: British Machine Vision Conference (2009)

    Google Scholar 

  16. Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. Ochs, P., Brox, T.: Object segmentation in video: a hierarchical variational approach for turning point trajectories into dense regions. In: Proc. of the International Conference on Computer Vision (2011)

    Google Scholar 

  18. Lee, Y.J., Kim, J., Grauman, K.: Key-segments for video object segmentation. In: Proc. of the International Conference on Computer Vision (2011)

    Google Scholar 

  19. Ayvaci, A., Raptis, M., Soatto, S.: Occlusion detection and motion estimation with convex optimization. In: Advances in Neural Information Processing Systems (2010)

    Google Scholar 

  20. Ayvaci, A., Soatto, S.: Detachable object detection with efficient model selection. In: Boykov, Y., Kahl, F., Lempitsky, V., Schmidt, F.R. (eds.) EMMCVPR 2011. LNCS, vol. 6819, pp. 191–204. Springer, Heidelberg (2011)

    Google Scholar 

  21. Martin, D., Fowlkes, C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence (2004)

    Google Scholar 

  22. Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  23. Tighe, J., Lazebnik, S.: Understanding scenes on many levels. In: Proc. of the International Conference on Computer Vision (2011)

    Google Scholar 

  24. Opelt, A., Pinz, A.: Object localization with boosting and weak supervision for generic object recognition. In: Proc. of the Scandinavian Conference on Image Analysis (2005)

    Google Scholar 

  25. Fulkerson, B., Vedaldi, A., Soatto, S.: Class segmentation and object localization with superpixel neighborhoods. In: Proc. of the International Conference on Computer Vision (2009)

    Google Scholar 

  26. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge (2011)

    Google Scholar 

  27. Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (2004)

    Google Scholar 

  28. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence (2004)

    Google Scholar 

  29. Yang, Y., Hallman, S., Ramanan, D., Fowlkes, C.C.: Layered object models for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2011)

    Google Scholar 

  30. Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 1.21 (2011)

    Google Scholar 

  31. Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V.D., Boyd, S.P., Kimura, H. (eds.) Recent Advances in Learning and Control. LNCIS, vol. 371, pp. 95–110. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Taylor, B., Ayvaci, A., Ravichandran, A., Soatto, S. (2013). Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework. In: Heyden, A., Kahl, F., Olsson, C., Oskarsson, M., Tai, XC. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2013. Lecture Notes in Computer Science, vol 8081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40395-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40395-8_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40394-1

  • Online ISBN: 978-3-642-40395-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics