Abstract
We describe an approach to incorporate scene topology and semantics into pixel-level object detection and localization. Our method requires video to determine occlusion regions and thence local depth ordering, and any visual recognition scheme that provides a score at local image regions, for instance object detection probabilities. We set up a cost functional that incorporates occlusion cues induced by object boundaries, label consistency and recognition priors, and solve it using a convex optimization scheme. We show that our method improves localization accuracy of existing recognition approaches, or equivalently provides semantic labels to pixel-level localization and segmentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)
Kohli, P., Ladicky, L., Torr, P.H.S.: Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision (2009)
Gould, S., Gao, T., Koller, D.: Region-based segmentation and object detection (2009)
Ladický, L., Russell, C., Kohli, P., Torr, P.H.S.: Associative hierarchical CRFs for object class image segmentation. In: Proc. of the International Conference on Computer Vision (2009)
Ladický, L., Sturgess, P., Russell, C., Sengupta, S., Bastanlar, Y., Clocksin, W., Torr, P.H.S.: Joint optimization for object class segmentation and dense stereo reconstruction. International Journal of Computer Vision (2011)
Floros, G., Leibe, B.: Joint 2d-3d temporally consistent segmentation of street scenes. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (2012)
Sturgess, P., Alahari, K., Ladický, L., Torr, P.H.S.: Combining appearance and structure from motion features for road scene understanding. In: British Machine Vision Conference (2009)
Tighe, J., Lazebnik, S.: Superparsing: Scalable nonparametric image parsing with superpixels. International Journal of Computer Vision (2012)
Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: Proc. of the International Conference on Computer Vision (2009)
Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (2006)
Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, where and how many? combining object detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (2010)
Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: Proc. of the International Conference on Computer Vision (2009)
Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 733–747. Springer, Heidelberg (2008)
Ess, A., Mueller, T., Grabner, H., van Gool, L.: Segmentation-based urban traffic scene understanding. In: British Machine Vision Conference (2009)
Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)
Ochs, P., Brox, T.: Object segmentation in video: a hierarchical variational approach for turning point trajectories into dense regions. In: Proc. of the International Conference on Computer Vision (2011)
Lee, Y.J., Kim, J., Grauman, K.: Key-segments for video object segmentation. In: Proc. of the International Conference on Computer Vision (2011)
Ayvaci, A., Raptis, M., Soatto, S.: Occlusion detection and motion estimation with convex optimization. In: Advances in Neural Information Processing Systems (2010)
Ayvaci, A., Soatto, S.: Detachable object detection with efficient model selection. In: Boykov, Y., Kahl, F., Lempitsky, V., Schmidt, F.R. (eds.) EMMCVPR 2011. LNCS, vol. 6819, pp. 191–204. Springer, Heidelberg (2011)
Martin, D., Fowlkes, C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Transactions on Pattern Analysis and Machine Intelligence (2004)
Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)
Tighe, J., Lazebnik, S.: Understanding scenes on many levels. In: Proc. of the International Conference on Computer Vision (2011)
Opelt, A., Pinz, A.: Object localization with boosting and weak supervision for generic object recognition. In: Proc. of the Scandinavian Conference on Image Analysis (2005)
Fulkerson, B., Vedaldi, A., Soatto, S.: Class segmentation and object localization with superpixel neighborhoods. In: Proc. of the International Conference on Computer Vision (2009)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge (2011)
Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (2004)
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence (2004)
Yang, Y., Hallman, S., Ramanan, D., Fowlkes, C.C.: Layered object models for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2011)
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 1.21 (2011)
Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V.D., Boyd, S.P., Kimura, H. (eds.) Recent Advances in Learning and Control. LNCIS, vol. 371, pp. 95–110. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Taylor, B., Ayvaci, A., Ravichandran, A., Soatto, S. (2013). Semantic Video Segmentation from Occlusion Relations within a Convex Optimization Framework. In: Heyden, A., Kahl, F., Olsson, C., Oskarsson, M., Tai, XC. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2013. Lecture Notes in Computer Science, vol 8081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40395-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-40395-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40394-1
Online ISBN: 978-3-642-40395-8
eBook Packages: Computer ScienceComputer Science (R0)