Superparsing

Tighe, Joseph; Lazebnik, Svetlana

doi:10.1007/s11263-012-0574-z

Superparsing

Scalable Nonparametric Image Parsing with Superpixels

Published: 06 October 2012

Volume 101, pages 329–349, (2013)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Joseph Tighe¹ &
Svetlana Lazebnik¹

2058 Accesses
155 Citations
3 Altmetric
Explore all metrics

Abstract

This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach is based on lazy learning, and it can easily scale to datasets with tens of thousands of images and hundreds of labels. Given a test image, it first performs global scene-level matching against the training set, followed by superpixel-level matching and efficient Markov random field (MRF) optimization for incorporating neighborhood context. Our MRF setup can also compute a simultaneous labeling of image regions into semantic classes (e.g., tree, building, car) and geometric classes (sky, vertical, ground). Our system outperforms the state-of-the-art nonparametric method based on SIFT Flow on a dataset of 2,688 images and 33 labels. In addition, we report per-pixel rates on a larger dataset of 45,676 images and 232 labels. To our knowledge, this is the first complete evaluation of image parsing on a dataset of this size, and it establishes a new benchmark for the problem. Finally, we present an extension of our method to video sequences and report results on a video dataset with frames densely labeled at 1 Hz.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Superpixel Correspondence for Non-parametric Scene Parsing of Natural Images

CollageParsing: Nonparametric Scene Parsing by Adaptive Overlapping Windows

Kernel Likelihood Estimation for Superpixel Image Parsing

Notes

We set K=200 and σ=.8.
Code: http://www.robots.ox.ac.uk/~vgg/research/texclass/filters.html.
Note that our original system (Tighe and Lazebnik 2010) did not use the sigmoid nonlinearity, but in our subsequent work (Tighe and Lazebnik 2011) we found it necessary to successfully perform more complex multi-level inference. We have also found that the sigmoid is a good way of making the output of the nonparametric classifier comparable to that of other classifiers, for example, boosted decision trees (see Sect. 3.1).
http://videosegmentation.com/.
Since the videos were taken from a forward-moving camera, we have found the segmentation results to be better if we run the videos through the system backwards.

References

Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1124–1137.
Article Google Scholar
Boykov, Y., Veksler, O., & Zabih, R. (2001). Efficient approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12), 1222–1239.
Article Google Scholar
Brostow, G. J., Shotton, J., Fauqueur, J., & Cipolla, R. (2008). Segmentation and recognition using structure from motion point clouds. In Proceedings European conference computer vision (pp. 1–15).
Google Scholar
Divvala, S., Hoiem, D., Hays, J., Efros, A., & Hebert, M. (2009). An empirical study of context in object detection. In Proceedings IEEE conference computer vision and pattern recognition (pp. 1271–1278).
Google Scholar
Eigen, D., & Fergus, R. (2012). Nonparametric image parsing using adaptive neighbor sets. In: Proceedings IEEE conference computer vision and pattern recognition.
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2012) Scene parsing with multiscale feature learning, purity trees, and optimal covers. arXiv preprint.
Felzenszwalb, P., Mcallester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In: Proceedings IEEE conference computer vision and pattern recognition.
Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 2(2), 1–26.
Google Scholar
Galleguillos, C., & Belongie, S. (2010). Context based object categorization: a critical survey. Computer Vision and Image Understanding, 114(6), 712–722.
Article Google Scholar
Galleguillos, C., Mcfee, B., Belongie, S., & Lanckriet, G. (2010). Multi-class object localization by combining local contextual interactions. In Proceedings IEEE conference computer vision and pattern recognition.
Google Scholar
Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In Proceedings IEEE international conference computer vision.
Google Scholar
Grundmann, M., Kwatra, V., Han, M., & Essa, I. (2010). Efficient hierarchical graph-based video segmentation. In Proceedings IEEE conference computer vision and pattern recognition.
Google Scholar
Gu, C., Lim, J. J., Arbel, P., & Malik, J. (2009). Recognition using regions. In Proceedings IEEE conference computer vision and pattern recognition.
Google Scholar
Gupta, A., Satkin, S., Efros, A. A., & Hebert, M. (2011). From 3D scene geometry to human workspace. In Proceedings IEEE conference computer vision and pattern recognition.
Google Scholar
Hays, J., & Efros, A. A. (2008). IM 2 GPS: estimating geographic information from a single image. In Proceedings IEEE conference computer vision and pattern recognition (Vol. 05).
Google Scholar
He, X., Zemel, R. S., & Carreira-Perpinan, M. A. (2004). Multiscale conditional random fields for image labeling. In Proceedings IEEE conference computer vision and pattern recognition.
Google Scholar
Hedau, V., & Hoiem, D. (2010). Thinking inside the box: using appearance models and context based on room geometry. In Proceedings European conference computer vision (pp. 1–14).
Google Scholar
Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In Proceedings IEEE international conference computer vision.
Google Scholar
Heitz, G., & Koller, D. (2008). Learning spatial context: using stuff to find things. In Proceedings European conference computer vision (pp. 1–14).
Google Scholar
Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1).
Janoch, A., Karayev, S., Jia, Y., Barron, J. T., Fritz, M., Saenko, K., & Darrell, T. (2011). A category-level 3-D object dataset: putting the Kinect to work. In ICCV workshop.
Google Scholar
Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 147–159.
Article Google Scholar
Ladicky, L., Sturgess, P., Alahari, K., Russell, C., & Torr, P. H. S. (2010). What, where and how many? Combining object detectors and CRFs. In Proceedings European conference computer vision (pp. 424–437).
Google Scholar
Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A scalable tree-based approach for joint object and pose recognition. In Artificial intelligence.
Google Scholar
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Proceedings IEEE conference computer vision and pattern recognition (Vol. 2).
Google Scholar
Liu, C., Yuen, J., & Torralba, A. (2011a). Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2368–2382.
Article Google Scholar
Liu, C., Yuen, J., & Torralba, A. (2011b). SIFT flow: dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 978–994.
Article Google Scholar
Malisiewicz, T., & Efros, A. A. (2008). Recognition by association via learning per-exemplar distances. In Proceedings IEEE conference computer vision and pattern recognition (pp. 1–8).
Google Scholar
Malisiewicz, T., & Efros, A. A. (2011). Ensemble of exemplar-SVMs for object detection and beyond. In Proceedings IEEE international conference computer vision (pp. 89–96).
Google Scholar
Nowozin, S., Carsten, R., Bagon, S., Sharp, T., Yao, B., & Kohli, P. (2011). Decision tree fields. In Proceedings IEEE international conference computer vision (pp. 1668–1675).
Google Scholar
Oliva, A., & Torralba, A. (2006). Building the gist of a scene: the role of global image features in recognition. visual perception. Progress in Brain Research, 155, 23–36.
Article Google Scholar
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In Proceedings IEEE international conference computer vision (pp. 1–8).
Google Scholar
Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In Proceedings IEEE international conference computer vision.
Google Scholar
Russell, B. C., Torralba, A., Liu, C., Fergus, R., & Freeman, W. T. (2007). Object recognition by scene alignment. In Neural information processing systems foundation.
Google Scholar
Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173.
Article Google Scholar
Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In Proceedings IEEE conference computer vision and pattern recognition.
Google Scholar
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings European conference computer vision (pp. 1–14).
Google Scholar
Silberman, N., & Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In Proceedings IEEE international conference computer vision workshop.
Google Scholar
Socher, R., Lin, C. C. Y., Ng, A. Y., & Manning, C. D. (2011). Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the international conference on machine learning.
Google Scholar
Sturgess, P., Alahari, K., Ladicky, L., & Torr, P. H. S. (2009). Combining appearance and structure from motion features for road scene understanding. In British machine vision conference (pp. 1–11).
Google Scholar
Tighe, J., & Lazebnik, S. (2010). SuperParsing: scalable nonparametric image parsing with superpixels. In Proceedings European conference computer vision.
Google Scholar
Tighe, J., & Lazebnik, S. (2011). Understanding scenes on many levels. In Proceedings IEEE international conference computer vision (pp. 335–342).
Google Scholar
Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958–1970.
Article Google Scholar
Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. In Proceedings IEEE conference computer vision and pattern recognition (pp. 3485–3492).
Google Scholar
Xiao, J., & Quan, L. (2009). Multiple view semantic segmentation for street view images. In Proceedings IEEE international conference computer vision.
Google Scholar
Xu, C., & Corso, J. J. (2012). Evaluation of super-voxel methods for early video processing. In Proceedings IEEE conference computer vision and pattern recognition.
Google Scholar
Zhang, C., Wang, L., & Yang, R. (2010). Semantic segmentation of urban scenes using dense depth maps. In Proceedings European conference computer vision (pp. 708–721).
Google Scholar

Download references

Acknowledgements

This research was supported in part by NSF grants IIS-0845629 and IIS-0916829, DARPA Computer Science Study Group, Microsoft Research Faculty Fellowship, and Xerox.

Author information

Authors and Affiliations

Computer Science Department, University of North Carolina, Chapel Hill, NC, USA
Joseph Tighe & Svetlana Lazebnik

Authors

Joseph Tighe
View author publications
You can also search for this author in PubMed Google Scholar
Svetlana Lazebnik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joseph Tighe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tighe, J., Lazebnik, S. Superparsing. Int J Comput Vis 101, 329–349 (2013). https://doi.org/10.1007/s11263-012-0574-z

Download citation

Received: 10 May 2012
Accepted: 11 September 2012
Published: 06 October 2012
Issue Date: January 2013
DOI: https://doi.org/10.1007/s11263-012-0574-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Superparsing

Abstract

Access this article

Similar content being viewed by others

Superpixel Correspondence for Non-parametric Scene Parsing of Natural Images

CollageParsing: Nonparametric Scene Parsing by Adaptive Overlapping Windows

Kernel Likelihood Estimation for Superpixel Image Parsing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Superparsing

Abstract

Access this article

Similar content being viewed by others

Superpixel Correspondence for Non-parametric Scene Parsing of Natural Images

CollageParsing: Nonparametric Scene Parsing by Adaptive Overlapping Windows

Kernel Likelihood Estimation for Superpixel Image Parsing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation