Abstract
This paper presents an approach to image understanding on the aspect of unsupervised scene segmentation. With the goal of image understanding in mind, we consider ‘unsupervised scene segmentation’ a task of dividing a given image into semantically meaningful regions without using annotation or other human-labeled information. We seek to investigate how well an algorithm can achieve at partitioning an image with limited human-involved learning procedures. Specifically, we are interested in developing an unsupervised segmentation algorithm that only relies on the contextual prior learned from a set of images. Our algorithm incorporates a small set of images that are similar to the input image in their scene structures. We use the sparse coding technique to analyze the appearance of this set of images; the effectiveness of sparse coding allows us to derive a priori the context of the scene from the set of images. Gaussian mixture models can then be constructed for different parts of the input image based on the sparse-coding contextual prior, and can be combined into an Markov-random-field-based segmentation process. The experimental results show that our unsupervised segmentation algorithm is able to partition an image into semantic regions, such as buildings, roads, trees, and skies, without using human-annotated information. The semantic regions generated by our algorithm can be useful, as pre-processed inputs for subsequent classification-based labeling algorithms, in achieving automatic scene annotation and scene parsing.
Similar content being viewed by others
References
Boykov Y., Veksler O., Zabih R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)
Comaniciu D., Meer P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Felzenszwalb P.F., Huttenlocher D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)
Hays J., Efros A.A.: Scene completion using millions of photographs. Commun. ACM 51(10), 87–94 (2008)
Hoiem, D., Efros, A.A., Hebert, M.: Geometric context from a single image. In: ICCV, pp. 654–661 (2005)
Hoiem, D., Efros, A.A., Hebert, M.: Closing the loop in scene interpretation. In: CVPR (2008)
Hubert L., Arabie P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Huttenlocher D.P., Klanderman G.A., Rucklidge W.: Comparing images using the hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)
Joulin, A., Bach, F.R., Ponce, J.: Discriminative clustering for image co-segmentation. In: CVPR, pp. 1943–1950 (2010)
Liu, C., Yuen, J., Torralba, A.B.: Nonparametric scene parsing: label transfer via dense scene alignment. In: CVPR, pp. 1972–1979 (2009)
Lowe D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML, p. 87 (2009)
Mori, G., Ren, X., Efros, A.A., Malik, J.: Recovering human body configurations: combining segmentation and recognition. In: CVPR (2), pp. 326–333 (2004)
Oliva A., Torralba A.B.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Quack, T., Leibe, B., Gool, L.J.V.: World-scale mining of objects and events from community photo collections. In: CIVR, pp. 47–56 (2008)
Rand W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Rother C., Kolmogorov V., Blake A.: “grabcut”—interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)
Rother, C., Minka, T.P., Blake, A., Kolmogorov, V.: Cosegmentation of image pairs by histogram matching—incorporating a global constraint into MRFs. In: CVPR (1), pp. 993–1000 (2006)
Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., Zisserman, A.: Segmenting scenes by matching image composites. In: NIPS (2009)
Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: CVPR (2), pp. 1605–1614 (2006)
Russell, B.C., Torralba, A.B.: Building a database of 3d scenes from user annotations. In: CVPR, pp. 2711–2718 (2009)
Russell B.C., Torralba A.B., Murphy K.P., Freeman W.T.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)
Shi J., Malik J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Simon, I., Seitz, S.M.: Scene segmentation using the wisdom of crowds. In: ECCV (2), pp. 541–553 (2008)
Snavely N., Seitz S.M., Szeliski R.: Modeling the world from internet photo collections. Int. J. Comput. Vis. 80(2), 189–210 (2008)
Tu, Z., Chen, X., Yuille, A.L., Zhu, S.C.: Image parsing: unifying segmentation, detection and recognition. In: ICCV, pp. 18–25 (2003)
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: CVPR, pp. 3485–3492 (2010)
Yang, J., Yu, K., Huang, T.S.: Efficient highly over-complete sparse coding using a mixture model. In: ECCV (5), pp. 113–126 (2010)
Yu, K., Zhang, T., Gong, Y.: Nonlinear learning using local coordinate coding. In: NIPS (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, YC., Chen, HT. Unsupervised scene segmentation using sparse coding context. Machine Vision and Applications 24, 243–254 (2013). https://doi.org/10.1007/s00138-011-0401-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-011-0401-5