Midrange Geometric Interactions for Semantic Segmentation

Diebold, Julia; Nieuwenhuis, Claudia; Cremers, Daniel

doi:10.1007/s11263-015-0828-7

Midrange Geometric Interactions for Semantic Segmentation

Constraints for Continuous Multi-label Optimization

Published: 10 July 2015

Volume 117, pages 199–225, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Julia Diebold¹,
Claudia Nieuwenhuis² &
Daniel Cremers¹

1226 Accesses
2 Citations
Explore all metrics

Abstract

In this article we introduce the concept of midrange geometric constraints into semantic segmentation. We call these constraints ‘midrange’ since they are neither global constraints, which take into account all pixels without any spatial limitation, nor are they local constraints, which only regard single pixels or pairwise relations. Instead, the proposed constraints allow to discourage the occurrence of labels in the vicinity of each other, e.g., ‘wolf’ and ‘sheep’. ‘Vicinity’ encompasses spatial distance as well as specific spatial directions simultaneously, e.g., ‘plates’ are found directly above ‘tables’, but do not fly over them. It is up to the user to specifically define the spatial extent of the constraint between each two labels. Such constraints are not only interesting for scene segmentation, but also for part-based articulated or rigid objects. The reason is that object parts such as for example arms, torso and legs usually obey specific spatial rules, which are among the few things that remain valid for articulated objects over many images and which can be expressed in terms of the proposed midrange constraints, i.e. closeness and/or direction. We show, how midrange geometric constraints are formulated within a continuous multi-label optimization framework, and we give a convex relaxation, which allows us to find globally optimal solutions of the relaxed problem independent of the initialization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

The Pascal VOC dataset is not appropriate for the evaluation of the proposed midrange geometric priors since the images of the Pascal VOC segmentation task consist of only very few (often only one) objects and large ‘background’ areas. 64 %/90 % of the images contain less or equal one/two objects. The proposed constraints, however, allow to discourage the occurrence of labels in the vicinity of each other, e.g., that ‘sky’ lies above ‘ground’ or that the ‘shoes’ of a person appear below the ‘head’. We therefore chose datasets with more than three labels for the benchmark evaluations.
Note that (Bo and Fowlkes 2011) additionally neglected the region ‘shoes’.

References

Arbelaez, P., Hariharan, B., Gu, C., Gupta, S., Bourdev, L., & Malik, J. (2012). Semantic segmentation using regions and parts. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). iCoseg: Interactive co-segmentation with intelligent scribble guidance. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Bergbauer, J., Nieuwenhuis, C., Souiai, M., & Cremers, D. (2013). Proximity priors for variational semantic segmentation and recognition. In ICCV Workshop on Graphical Models for Scene Understanding.
Bo, Y., & Fowlkes, C. C. (2011). Shape-based pedestrian parsing. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In European Conference on Computer Vision (ECCV).
Carreira, J., & Sminchisescu, C. (2012). CPMC: Automatic object segmentation using constrained parametric min-cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(7), 1312–1328.
Article Google Scholar
Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision (JMIV), 40(1), 120–145.
Article MathSciNet MATH Google Scholar
Delong, A., & Boykov, Y. (2009). Globally optimal segmentation of multi-region objects. In IEEE International Conference on Computer Vision (ICCV).
Delong, A., Gorelick, L., Veksler, O., & Boykov, Y. (2012). Minimizing energies with hierarchical costs. International Journal on Computer Vision (IJCV), 100(1), 38–58.
Article MathSciNet MATH Google Scholar
Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302.
Article Google Scholar
Felzenszwalb, P. F., & Veksler, O. (2010). Tiered scene labeling with dynamic programming. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Fröhlich, B., Rodner, E., & Denzler, J. (2012). Semantic segmentation with millions of features: Integrating multiple cues in a combined random forest approach. In Asian Conference on Computer Vision (ACCV).
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Gould, S., Rodgers, J., Cohen, D., Elidan, G., & Koller, D. (2008). Multi-class segmentation with relative location prior. In International Journal on Computer Vision (IJCV).
Kohli, P., Kumar, M. P., Torr, P. H. S.: P3 & beyond: Solving energies with higher order cliques. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2007).
Kohli, P., Ladicky, L., & Torr, P. H. S. (2009). Robust higher order potentials for enforcing label consistency. International Journal on Computer Vision (IJCV), 82(3), 302–324.
Article Google Scholar
Komodakis, N., & Paragios, N. (2009). Beyond pairwise energies: Efficient optimization for higher-order MRFs. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Kontschieder, P., Kohli, P., Shotton, J., & Criminisi, A. (2013). Geof: Geodesic forests for learning coupled predictors. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Korc, F., & Förstner, W. (2009). eTRIMS Image Database for interpreting images of man-made scenes. Technical Report, Department of Photogrammetry, University of Bonn.
Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2009). Associative hierarchical CRFs for object class image segmentation. In IEEE International Conference on Computer Vision (ICCV).
Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2010). Graph cut based inference with co-occurrence statistics. In European Conference on Computer Vision (ECCV).
Liu, X., Veksler, O., & Samarabandu, J. (2010). Order-preserving moves for graph-cut-based optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 32(7), 1182– 1196.
Article Google Scholar
Lucchi, A., Li, Y., Boix, X., Smith, K., & Fua, P. (2011). Are spatial and global constraints really necessary for segmentation? In IEEE International Conference on Computer Vision (ICCV).
Luo, P., Wang, X., & Tang, X. (2013). Pedestrian parsing via deep decompositional network. In IEEE International Conference on Computer Vision (ICCV).
Malisiewicz, T., Efros, A. A. (2007). Improving spatial support for objects via multiple segmentations. In British Machine Vision Conference (BMVC).
Michelot, C. (1986). A finite algorithm for finding the projection of a point onto the canonical simplex of \({\mathbb{R}}^n\). Journal of Optimization Theory and Applications, 50(1), 195–200.
Article MathSciNet MATH Google Scholar
Möllenhoff, T., Nieuwenhuis, C., Toeppe, E., & Cremers, D. (2013). Efficient convex optimization for minimal partition problems with volume constraints. In Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR).
Nieuwenhuis, C., & Cremers, D. (2013). Spatially varying color distributions for interactive multi-label segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(5), 1234–1247.
Article Google Scholar
Nieuwenhuis, C., Strekalovskiy, E., & Cremers, D. (2013). Proportion priors for image sequence segmentation. In IEEE International Conference on Computer Vision (ICCV).
Nieuwenhuis, C., Töppe, E., & Cremers, D. (2013). A survey and comparison of discrete and continuous multi-label optimization approaches for the Potts model. International Journal on Computer Vision (IJCV), 104(3), 223–240.
Article MathSciNet MATH Google Scholar
Nosrati, M., Andrews, S., & Hamarneh, G. (2013). Bounded labeling function for global segmentation of multi-part objects with geometric constraints. In IEEE International Conference on Computer Vision (ICCV).
Pock, T., & Chambolle, A. (2011). Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In IEEE International Conference on Computer Vision (ICCV).
Pock, T., Chambolle, A., Bischof, H., & Cremers, D. (2009). A convex relaxation approach for computing minimal partitions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Pock, T., Cremers, D., Bischof, H., & Chambolle, A. (2009). An algorithm for minimizing the Mumford–Shah functional. In IEEE International Conference on Computer Vision (ICCV).
Ramanan, D. (2006). Learning to parse images of articulated bodies. In Proceedings of Neural Information Processing Systems (pp. 1129–1136). Cambridge: MIT Press.
Savarese, S., Winn, J., & Criminisi, A. (2006). Discriminative object class models of appearance and shape by correlatons. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Shannon, C. E. (2001). A mathematical theory of communication. SIGMOBILE Mobile Computing and Communications Review, 5(1), 3–55.
Article MathSciNet Google Scholar
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In European Conference on Computer Vision (ECCV).
Soille, P. (2003). Morphological image analysis: Principles and applications (2nd ed.). New York: Springer.
MATH Google Scholar
Souiai, M., Nieuwenhuis, C., Strekalovskiy, E., & Cremers, D. (2013). Convex optimization for scene understanding. In ICCV Workshop on Graphical Models for Scene Understanding.
Souiai, M., Strekalovskiy, E., Nieuwenhuis, C., & Cremers, D. (2013). A co-occurrence prior for continuous multi-label optimization. In Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR).
Strekalovskiy, E., & Cremers, D. (2011). Generalized ordering constraints for multilabel optimization. In IEEE International Conference on Computer Vision (ICCV).
Strekalovskiy, E., Goldluecke, B., & Cremers, D. (2011). Tight convex relaxations for vector-valued labeling problems. In IEEE International Conference on Computer Vision (ICCV).
Strekalovskiy, E., Nieuwenhuis, C., & Cremers, D. (2012). Nonmetric priors for continuous multilabel optimization. In European Conference on Computer Vision (ECCV).
Toeppe, E., Nieuwenhuis, C., & Cremers, D. (2013). Relative volume constraints for single view reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Toeppe, E., Oswald, M. R., Cremers, D., & Rother, C. (2010). Image-based 3d modeling via cheeger sets. In Asian Conference on Computer Vision (ACCV).
Vezhnevets, A., Ferrari, V., & Buhmann, J. M. (2011). Weakly supervised semantic segmentation with a multi-image model. In IEEE International Conference on Computer Vision (ICCV).
Wang, L., Shi, J., Song, G., & Shang, I. F. (2007). Object detection combining recognition and segmentation. In Asian Conference on Computer Vision (ACCV).
Yao, J., Fidler, S., & Urtasun, R. (2012). Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
Zach, C., Gallup, D., Frahm, J. M., & Niethammer, M. (2008). Fast global labeling for real-time stereo using multiple plane sweeps. In Vision, Modeling and Visualization Workshop (VMV).

Download references

Acknowledgments

We thank three anonymous reviewers for their constructive feedback.

Author information

Authors and Affiliations

Technische Universität München, Munich, Germany
Julia Diebold & Daniel Cremers
ICSI, UC Berkeley, Berkeley, USA
Claudia Nieuwenhuis

Authors

Julia Diebold
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Nieuwenhuis
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Cremers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julia Diebold.

Additional information

Communicated by Nikos Komodakis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Diebold, J., Nieuwenhuis, C. & Cremers, D. Midrange Geometric Interactions for Semantic Segmentation. Int J Comput Vis 117, 199–225 (2016). https://doi.org/10.1007/s11263-015-0828-7

Download citation

Received: 01 June 2014
Accepted: 15 May 2015
Published: 10 July 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11263-015-0828-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Midrange Geometric Interactions for Semantic Segmentation

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Microsoft COCO: Common Objects in Context

VOX2BIM+ - A Fast and Robust Approach for Automated Indoor Point Cloud Segmentation and Building Model Generation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Midrange Geometric Interactions for Semantic Segmentation

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Microsoft COCO: Common Objects in Context

VOX2BIM+ - A Fast and Robust Approach for Automated Indoor Point Cloud Segmentation and Building Model Generation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation