Advertisement

International Journal of Computer Vision

, Volume 107, Issue 1, pp 40–57 | Cite as

Probabilistic Joint Image Segmentation and Labeling by Figure-Ground Composition

  • Adrian Ion
  • João Carreira
  • Cristian Sminchisescu
Article

Abstract

We propose a layered statistical model for image segmentation and labeling obtained by combining independently extracted, possibly overlapping sets of figure-ground (FG) segmentations. The process of constructing consistent image segmentations, called tilings, is cast as optimization over sets of maximal cliques sampled from a graph connecting all non-overlapping figure-ground segment hypotheses. Potential functions over cliques combine unary, Gestalt-based figure qualities, and pairwise compatibilities among spatially neighboring segments, constrained by T-junctions and the boundary interface statistics of real scenes. Building on the segmentation layer, we further derive a joint image segmentation and labeling model (JSL) which, given a bag of FGs, constructs a joint probability distribution over both the compatible image interpretations (tilings) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, followed by sampling labelings conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on maximum likelihood with a novel estimation procedure we refer to as incremental saddle-point approximation. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that are rated as probable by candidate models during learning. State of the art results are reported on the Berkeley, Stanford and Pascal VOC datasets, where an improvement of 28 % was achieved for the segmentation task only (tiling), and an accuracy of 47.8 % was obtained on the test set of VOC12 for semantic labeling (JSL).

Keywords

Image segmentation Image labeling Semantic segmentation Statistical models  Learning and categorization 

Notes

Acknowledgments

This work was supported, in part, by CNCS-UEFICSDI, under PCE-2011-3-0438, and CT-ERC-2012-1, and by FCT under PTDC/EEA-CRO/122812/2010. The authors thank the anonymous reviewers for their useful comments and suggestions.

References

  1. Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2009). From contours to regions: An empirical evaluation. In: IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar
  2. Arbelaez, P., Hariharan, B., Gu, C., Gupta, S., Bourdev, L. D., & Malik, J. (2012). Semantic segmentation using regions and parts. In: IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar
  3. Bagon, S., Boiman, O., & Irani, M. (2008). What is a good image segment? a unified approach to segment extraction. In: European Conference on Computer Vision.Google Scholar
  4. Barbu, A., & Zhu, S. C. (2005). Generalizing swendsen-wang to sampling arbitrary posterior probabilities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1239–1253.CrossRefGoogle Scholar
  5. Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D. M., & Jordan, M. (2003). Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135.MATHGoogle Scholar
  6. Bomze, I., Budinich, M., Pardalos, P., & Pelillo, M. (1999). Handbook of combinatorial optimization (pp. 1–74). Dordrecht: Kluwer Academic.CrossRefGoogle Scholar
  7. Bomze, I., Pelillo, M., & Stix, V. (2000). Approximating the maximum weight clique using replicator dynamics. IEEE Transactions on Neural Networks, 11(6), 1228–1241.Google Scholar
  8. Brendel, W., & Todorovic, S. (2010). Segmentation as maximum-weight independent set. In: Advances in Neural Information Processing Systems.Google Scholar
  9. Carreira, J., & Sminchisescu, C. (2012). Cpmc: Automatic object segmentation using constrained parametric min-cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7), 1312–1328.Google Scholar
  10. Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012a). Semantic segmentation with second-order pooling. In: European Conference on Computer Vision.Google Scholar
  11. Carreira, J., Li, F., & Sminchisescu, C. (2012b). Object recognition by sequential figure-ground ranking. International Journal of Computer Vision, 98(3), 243–262.Google Scholar
  12. Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.CrossRefGoogle Scholar
  13. Cour, T., Gogin, N., & Shi, J. (2005). Learning spectral graph segmentation. In: Artificial Intelligence and Statistics.Google Scholar
  14. Csurka, G., & Perronnin, F. (2010). An efficient approach to semantic segmentation. International Journal of Computer Vision, 88, 1–15.Google Scholar
  15. Dann, C., Gehler, P. V., Roth, S., & Nowozin, S. (2012). Pottics—the potts topic model for semantic image segmentation. In: Proceedings of DAGM/OAGM Symposium.Google Scholar
  16. Endres, I., & Hoiem, D. (2010). Category independent object proposals. In: European Conference on Computer Vision.Google Scholar
  17. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.CrossRefGoogle Scholar
  18. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J. & Zisserman, A. (2012). The PASCAL visual object classes challenge (VOC) results. http://www.pascal-network.org/challenges/VOC/.
  19. Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 1915–1929.CrossRefGoogle Scholar
  20. Felzenszwalb, P., & Huttenlocher, D. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2), 167–181.CrossRefGoogle Scholar
  21. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRefGoogle Scholar
  22. Fulkerson, B., Vedaldi, A., & Soatto, S. (2009). Class segmentation and object localization with superpixel neighborhoods. In: IEEE International Conference on Computer Vision.Google Scholar
  23. Ghose, T., & Palmer, S. (2005). Surface convexity and extremal edges in depth and figure-ground perception. Journal of Vision, 5(8), 970–970.CrossRefGoogle Scholar
  24. Gonfaus, J. M., Boix, X., van de Weijer, J., Bagdanov, A. D., Serrat, J., & Gonzalez, J. (2010). Harmony potentials for joint classification and segmentation. In: IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar
  25. Gould, S., Rodgers, J., Cohen, D., Elidan, G., & Koller, D. (2008). Multi-class segmentation with relative location prior. International Journal of Computer Vision, 80(3), 300–316.CrossRefGoogle Scholar
  26. Gould, S., Fulton, R., & Koller, D. (2009a). Decomposing a scene into geometric and semantically consistent regions. In: IEEE International Conference on Computer Vision.Google Scholar
  27. Gould, S., Gao, T., & Koller, D. (2009b). Region-based segmentation and object detection. In: Advances in Neural Information Processing Systems.Google Scholar
  28. He, X., Zemel, R. S., & Carreira-Perpinan, M. (2004). Multiscale conditional random fields for image labeling. In: IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar
  29. Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.CrossRefMATHMathSciNetGoogle Scholar
  30. Hoiem, D., Efros, A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1), 151–172.CrossRefGoogle Scholar
  31. Huggins, P., Chen, H., Belhumeur, P., & Zucker, S. (2001). Finding folds: On the appearance and identification of occlusion. In: IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar
  32. Ion, A., Carreira, J., & Sminchisescu, C. (2011a). Image segmentation by figure-ground composition into maximal cliques. In: IEEE International Conference on Computer Vision.Google Scholar
  33. Ion, A., Carreira, J., & Sminchisescu, C. (2011b). Probabilistic joint image segmentation and labeling. In: Advances in Neural Information Processing Systems.Google Scholar
  34. Kohli, P., Ladicky, L., & Torr, P. (2008). Robust higher order potentials for enforcing label consistency. In: IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar
  35. Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1568–1583.CrossRefGoogle Scholar
  36. Kumar, M. P., & Koller, D. (2010). Efficiently selecting regions for scene understanding. In: IEEE International Conference on Computer Vision and Pattern Recognition.Google Scholar
  37. Kumar, S., August, J., & Hebert, M. (2005). Exploiting inference for approximate parameter learning in discriminative fields: An empirical study. In: Energy Minimization Methods in Computer Vision and Pattern Recognition.Google Scholar
  38. Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2009). Associative hierarchical crfs for object class image segmentation. In: IEEE International Conference on Computer Vision.Google Scholar
  39. Ladicky, L., Sturgess, P., Alaharia, K., Russel, C., & Torr, P. (2010). What, where & how many ? combining object detectors and crfs. In: European Conference on Computer Vision.Google Scholar
  40. Leichter, I. & Lindenbaum, M., (2009). Boundary ownership by lifting to 2.1d. In: IEEE International Conference on Computer Vision.Google Scholar
  41. Li, F., Ionescu, C., & Sminchisescu, C. (2010). Random Fourier approximations for skewed multiplicative histogram kernel. In: Proceedings of DAGM Symposium.Google Scholar
  42. Lim, J., Arbelaez, P., Gu, C., & Malik, J. (2009). Context by region ancestry. In: IEEE International Conference on Computer Vision.Google Scholar
  43. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91– 110.CrossRefGoogle Scholar
  44. Malisiewicz, T., & Efros, A. (2007). Improving spatial support for objects via multiple segmentations. In: British Machine Vision Conference. Google Scholar
  45. Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: IEEE International Conference on Computer Vision.Google Scholar
  46. Nowozin, S., Gehler, P., & Lampert, C. (2010). On parameter learning in crf-based approaches to object class image segmentation. In: European Conference on Computer Vision.Google Scholar
  47. Pantofaru, C., Schmid, C., & Hebert, M. (2008). Object recognition by integrating multiple image segmentations. In: European Conference on Computer Vision.Google Scholar
  48. Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems.Google Scholar
  49. Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In: IEEE International Conference on Computer Vision.Google Scholar
  50. Ren, X., Fowlkes, C., & Malik, J. (2006). Figure/ground assignment in natural images. In: European Conference on Computer Vision.Google Scholar
  51. van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596.CrossRefGoogle Scholar
  52. Sarawagi, S., & Cohen, W. W. (2004). Semi-markov conditional random fields for information extraction. In: Advances in Neural Information Processing Systems.Google Scholar
  53. Sharon, E., Galun, M., Sharon, D., Basri, R., & Brandt, A. (2006). Hierarchy and adaptivity in segmenting visual scenes. Nature, 442(7104), 719–846.CrossRefGoogle Scholar
  54. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRefGoogle Scholar
  55. Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81, 2–23.Google Scholar
  56. Tu, Z., Chen, X., Yuille, A., & Zhu, S. C. (2003). Image parsing: unifying segmentation, detection, and recognition. In: IEEE International Conference on Computer Vision.Google Scholar
  57. Xia, W., Song, Z., Feng, J., Cheong, L.F. & Yan, S. (2012). Segmentation over detection by coupled global and local sparse representations. In: European Conference on Computer Vision.Google Scholar
  58. Yang, Y., Hallman, S., Ramanan, D., & Fowlkes, C. C. (2012). Layered object models for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1731–1743.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Adrian Ion
    • 1
  • João Carreira
    • 2
  • Cristian Sminchisescu
    • 4
    • 3
  1. 1.Faculty of InformaticsVienna University of TechnologyViennaAustria
  2. 2.Institute of Systems and RoboticsUniversity of CoimbraCoimbraPortugal
  3. 3.Department of Mathematics, Faculty of EngineeringLund UniversityLundSweden
  4. 4.Institute of Mathematics of the Romanian AcademyBucharestRomania

Personalised recommendations