Skip to main content
Log in

Midrange Geometric Interactions for Semantic Segmentation

Constraints for Continuous Multi-label Optimization

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In this article we introduce the concept of midrange geometric constraints into semantic segmentation. We call these constraints ‘midrange’ since they are neither global constraints, which take into account all pixels without any spatial limitation, nor are they local constraints, which only regard single pixels or pairwise relations. Instead, the proposed constraints allow to discourage the occurrence of labels in the vicinity of each other, e.g., ‘wolf’ and ‘sheep’. ‘Vicinity’ encompasses spatial distance as well as specific spatial directions simultaneously, e.g., ‘plates’ are found directly above ‘tables’, but do not fly over them. It is up to the user to specifically define the spatial extent of the constraint between each two labels. Such constraints are not only interesting for scene segmentation, but also for part-based articulated or rigid objects. The reason is that object parts such as for example arms, torso and legs usually obey specific spatial rules, which are among the few things that remain valid for articulated objects over many images and which can be expressed in terms of the proposed midrange constraints, i.e. closeness and/or direction. We show, how midrange geometric constraints are formulated within a continuous multi-label optimization framework, and we give a convex relaxation, which allows us to find globally optimal solutions of the relaxed problem independent of the initialization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Notes

  1. The Pascal VOC dataset is not appropriate for the evaluation of the proposed midrange geometric priors since the images of the Pascal VOC segmentation task consist of only very few (often only one) objects and large ‘background’ areas. 64 %/90 % of the images contain less or equal one/two objects. The proposed constraints, however, allow to discourage the occurrence of labels in the vicinity of each other, e.g., that ‘sky’ lies above ‘ground’ or that the ‘shoes’ of a person appear below the ‘head’. We therefore chose datasets with more than three labels for the benchmark evaluations.

  2. Note that (Bo and Fowlkes 2011) additionally neglected the region ‘shoes’.

References

  • Arbelaez, P., Hariharan, B., Gu, C., Gupta, S., Bourdev, L., & Malik, J. (2012). Semantic segmentation using regions and parts. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). iCoseg: Interactive co-segmentation with intelligent scribble guidance. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Bergbauer, J., Nieuwenhuis, C., Souiai, M., & Cremers, D. (2013). Proximity priors for variational semantic segmentation and recognition. In ICCV Workshop on Graphical Models for Scene Understanding.

  • Bo, Y., & Fowlkes, C. C. (2011). Shape-based pedestrian parsing. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In European Conference on Computer Vision (ECCV).

  • Carreira, J., & Sminchisescu, C. (2012). CPMC: Automatic object segmentation using constrained parametric min-cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(7), 1312–1328.

    Article  Google Scholar 

  • Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision (JMIV), 40(1), 120–145.

    Article  MathSciNet  MATH  Google Scholar 

  • Delong, A., & Boykov, Y. (2009). Globally optimal segmentation of multi-region objects. In IEEE International Conference on Computer Vision (ICCV).

  • Delong, A., Gorelick, L., Veksler, O., & Boykov, Y. (2012). Minimizing energies with hierarchical costs. International Journal on Computer Vision (IJCV), 100(1), 38–58.

    Article  MathSciNet  MATH  Google Scholar 

  • Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302.

    Article  Google Scholar 

  • Felzenszwalb, P. F., & Veksler, O. (2010). Tiered scene labeling with dynamic programming. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Fröhlich, B., Rodner, E., & Denzler, J. (2012). Semantic segmentation with millions of features: Integrating multiple cues in a combined random forest approach. In Asian Conference on Computer Vision (ACCV).

  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Gould, S., Rodgers, J., Cohen, D., Elidan, G., & Koller, D. (2008). Multi-class segmentation with relative location prior. In International Journal on Computer Vision (IJCV).

  • Kohli, P., Kumar, M. P., Torr, P. H. S.: P3 & beyond: Solving energies with higher order cliques. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2007).

  • Kohli, P., Ladicky, L., & Torr, P. H. S. (2009). Robust higher order potentials for enforcing label consistency. International Journal on Computer Vision (IJCV), 82(3), 302–324.

    Article  Google Scholar 

  • Komodakis, N., & Paragios, N. (2009). Beyond pairwise energies: Efficient optimization for higher-order MRFs. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Kontschieder, P., Kohli, P., Shotton, J., & Criminisi, A. (2013). Geof: Geodesic forests for learning coupled predictors. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Korc, F., & Förstner, W. (2009). eTRIMS Image Database for interpreting images of man-made scenes. Technical Report, Department of Photogrammetry, University of Bonn.

  • Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2009). Associative hierarchical CRFs for object class image segmentation. In IEEE International Conference on Computer Vision (ICCV).

  • Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2010). Graph cut based inference with co-occurrence statistics. In European Conference on Computer Vision (ECCV).

  • Liu, X., Veksler, O., & Samarabandu, J. (2010). Order-preserving moves for graph-cut-based optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 32(7), 1182– 1196.

    Article  Google Scholar 

  • Lucchi, A., Li, Y., Boix, X., Smith, K., & Fua, P. (2011). Are spatial and global constraints really necessary for segmentation? In IEEE International Conference on Computer Vision (ICCV).

  • Luo, P., Wang, X., & Tang, X. (2013). Pedestrian parsing via deep decompositional network. In IEEE International Conference on Computer Vision (ICCV).

  • Malisiewicz, T., Efros, A. A. (2007). Improving spatial support for objects via multiple segmentations. In British Machine Vision Conference (BMVC).

  • Michelot, C. (1986). A finite algorithm for finding the projection of a point onto the canonical simplex of \({\mathbb{R}}^n\). Journal of Optimization Theory and Applications, 50(1), 195–200.

    Article  MathSciNet  MATH  Google Scholar 

  • Möllenhoff, T., Nieuwenhuis, C., Toeppe, E., & Cremers, D. (2013). Efficient convex optimization for minimal partition problems with volume constraints. In Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR).

  • Nieuwenhuis, C., & Cremers, D. (2013). Spatially varying color distributions for interactive multi-label segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(5), 1234–1247.

    Article  Google Scholar 

  • Nieuwenhuis, C., Strekalovskiy, E., & Cremers, D. (2013). Proportion priors for image sequence segmentation. In IEEE International Conference on Computer Vision (ICCV).

  • Nieuwenhuis, C., Töppe, E., & Cremers, D. (2013). A survey and comparison of discrete and continuous multi-label optimization approaches for the Potts model. International Journal on Computer Vision (IJCV), 104(3), 223–240.

    Article  MathSciNet  MATH  Google Scholar 

  • Nosrati, M., Andrews, S., & Hamarneh, G. (2013). Bounded labeling function for global segmentation of multi-part objects with geometric constraints. In IEEE International Conference on Computer Vision (ICCV).

  • Pock, T., & Chambolle, A. (2011). Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In IEEE International Conference on Computer Vision (ICCV).

  • Pock, T., Chambolle, A., Bischof, H., & Cremers, D. (2009). A convex relaxation approach for computing minimal partitions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Pock, T., Cremers, D., Bischof, H., & Chambolle, A. (2009). An algorithm for minimizing the Mumford–Shah functional. In IEEE International Conference on Computer Vision (ICCV).

  • Ramanan, D. (2006). Learning to parse images of articulated bodies. In Proceedings of Neural Information Processing Systems (pp. 1129–1136). Cambridge: MIT Press.

  • Savarese, S., Winn, J., & Criminisi, A. (2006). Discriminative object class models of appearance and shape by correlatons. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Shannon, C. E. (2001). A mathematical theory of communication. SIGMOBILE Mobile Computing and Communications Review, 5(1), 3–55.

    Article  MathSciNet  Google Scholar 

  • Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In European Conference on Computer Vision (ECCV).

  • Soille, P. (2003). Morphological image analysis: Principles and applications (2nd ed.). New York: Springer.

    MATH  Google Scholar 

  • Souiai, M., Nieuwenhuis, C., Strekalovskiy, E., & Cremers, D. (2013). Convex optimization for scene understanding. In ICCV Workshop on Graphical Models for Scene Understanding.

  • Souiai, M., Strekalovskiy, E., Nieuwenhuis, C., & Cremers, D. (2013). A co-occurrence prior for continuous multi-label optimization. In Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR).

  • Strekalovskiy, E., & Cremers, D. (2011). Generalized ordering constraints for multilabel optimization. In IEEE International Conference on Computer Vision (ICCV).

  • Strekalovskiy, E., Goldluecke, B., & Cremers, D. (2011). Tight convex relaxations for vector-valued labeling problems. In IEEE International Conference on Computer Vision (ICCV).

  • Strekalovskiy, E., Nieuwenhuis, C., & Cremers, D. (2012). Nonmetric priors for continuous multilabel optimization. In European Conference on Computer Vision (ECCV).

  • Toeppe, E., Nieuwenhuis, C., & Cremers, D. (2013). Relative volume constraints for single view reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

  • Toeppe, E., Oswald, M. R., Cremers, D., & Rother, C. (2010). Image-based 3d modeling via cheeger sets. In Asian Conference on Computer Vision (ACCV).

  • Vezhnevets, A., Ferrari, V., & Buhmann, J. M. (2011). Weakly supervised semantic segmentation with a multi-image model. In IEEE International Conference on Computer Vision (ICCV).

  • Wang, L., Shi, J., Song, G., & Shang, I. F. (2007). Object detection combining recognition and segmentation. In Asian Conference on Computer Vision (ACCV).

  • Yao, J., Fidler, S., & Urtasun, R. (2012). Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).

  • Zach, C., Gallup, D., Frahm, J. M., & Niethammer, M. (2008). Fast global labeling for real-time stereo using multiple plane sweeps. In Vision, Modeling and Visualization Workshop (VMV).

Download references

Acknowledgments

We thank three anonymous reviewers for their constructive feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julia Diebold.

Additional information

Communicated by Nikos Komodakis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Diebold, J., Nieuwenhuis, C. & Cremers, D. Midrange Geometric Interactions for Semantic Segmentation. Int J Comput Vis 117, 199–225 (2016). https://doi.org/10.1007/s11263-015-0828-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-015-0828-7

Keywords

Navigation