Skip to main content
Log in

Superparsing

Scalable Nonparametric Image Parsing with Superpixels

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach is based on lazy learning, and it can easily scale to datasets with tens of thousands of images and hundreds of labels. Given a test image, it first performs global scene-level matching against the training set, followed by superpixel-level matching and efficient Markov random field (MRF) optimization for incorporating neighborhood context. Our MRF setup can also compute a simultaneous labeling of image regions into semantic classes (e.g., tree, building, car) and geometric classes (sky, vertical, ground). Our system outperforms the state-of-the-art nonparametric method based on SIFT Flow on a dataset of 2,688 images and 33 labels. In addition, we report per-pixel rates on a larger dataset of 45,676 images and 232 labels. To our knowledge, this is the first complete evaluation of image parsing on a dataset of this size, and it establishes a new benchmark for the problem. Finally, we present an extension of our method to video sequences and report results on a video dataset with frames densely labeled at 1 Hz.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. We set K=200 and σ=.8.

  2. Code: http://www.robots.ox.ac.uk/~vgg/research/texclass/filters.html.

  3. Note that our original system (Tighe and Lazebnik 2010) did not use the sigmoid nonlinearity, but in our subsequent work (Tighe and Lazebnik 2011) we found it necessary to successfully perform more complex multi-level inference. We have also found that the sigmoid is a good way of making the output of the nonparametric classifier comparable to that of other classifiers, for example, boosted decision trees (see Sect. 3.1).

  4. http://videosegmentation.com/.

  5. Since the videos were taken from a forward-moving camera, we have found the segmentation results to be better if we run the videos through the system backwards.

References

  • Boykov, Y., & Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1124–1137.

    Article  Google Scholar 

  • Boykov, Y., Veksler, O., & Zabih, R. (2001). Efficient approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12), 1222–1239.

    Article  Google Scholar 

  • Brostow, G. J., Shotton, J., Fauqueur, J., & Cipolla, R. (2008). Segmentation and recognition using structure from motion point clouds. In Proceedings European conference computer vision (pp. 1–15).

    Google Scholar 

  • Divvala, S., Hoiem, D., Hays, J., Efros, A., & Hebert, M. (2009). An empirical study of context in object detection. In Proceedings IEEE conference computer vision and pattern recognition (pp. 1271–1278).

    Google Scholar 

  • Eigen, D., & Fergus, R. (2012). Nonparametric image parsing using adaptive neighbor sets. In: Proceedings IEEE conference computer vision and pattern recognition.

  • Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2012) Scene parsing with multiscale feature learning, purity trees, and optimal covers. arXiv preprint.

  • Felzenszwalb, P., Mcallester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In: Proceedings IEEE conference computer vision and pattern recognition.

  • Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 2(2), 1–26.

    Google Scholar 

  • Galleguillos, C., & Belongie, S. (2010). Context based object categorization: a critical survey. Computer Vision and Image Understanding, 114(6), 712–722.

    Article  Google Scholar 

  • Galleguillos, C., Mcfee, B., Belongie, S., & Lanckriet, G. (2010). Multi-class object localization by combining local contextual interactions. In Proceedings IEEE conference computer vision and pattern recognition.

    Google Scholar 

  • Gould, S., Fulton, R., & Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In Proceedings IEEE international conference computer vision.

    Google Scholar 

  • Grundmann, M., Kwatra, V., Han, M., & Essa, I. (2010). Efficient hierarchical graph-based video segmentation. In Proceedings IEEE conference computer vision and pattern recognition.

    Google Scholar 

  • Gu, C., Lim, J. J., Arbel, P., & Malik, J. (2009). Recognition using regions. In Proceedings IEEE conference computer vision and pattern recognition.

    Google Scholar 

  • Gupta, A., Satkin, S., Efros, A. A., & Hebert, M. (2011). From 3D scene geometry to human workspace. In Proceedings IEEE conference computer vision and pattern recognition.

    Google Scholar 

  • Hays, J., & Efros, A. A. (2008). IM 2 GPS: estimating geographic information from a single image. In Proceedings IEEE conference computer vision and pattern recognition (Vol. 05).

    Google Scholar 

  • He, X., Zemel, R. S., & Carreira-Perpinan, M. A. (2004). Multiscale conditional random fields for image labeling. In Proceedings IEEE conference computer vision and pattern recognition.

    Google Scholar 

  • Hedau, V., & Hoiem, D. (2010). Thinking inside the box: using appearance models and context based on room geometry. In Proceedings European conference computer vision (pp. 1–14).

    Google Scholar 

  • Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered rooms. In Proceedings IEEE international conference computer vision.

    Google Scholar 

  • Heitz, G., & Koller, D. (2008). Learning spatial context: using stuff to find things. In Proceedings European conference computer vision (pp. 1–14).

    Google Scholar 

  • Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. International Journal of Computer Vision, 75(1).

  • Janoch, A., Karayev, S., Jia, Y., Barron, J. T., Fritz, M., Saenko, K., & Darrell, T. (2011). A category-level 3-D object dataset: putting the Kinect to work. In ICCV workshop.

    Google Scholar 

  • Kolmogorov, V., & Zabih, R. (2004). What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 147–159.

    Article  Google Scholar 

  • Ladicky, L., Sturgess, P., Alahari, K., Russell, C., & Torr, P. H. S. (2010). What, where and how many? Combining object detectors and CRFs. In Proceedings European conference computer vision (pp. 424–437).

    Google Scholar 

  • Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A scalable tree-based approach for joint object and pose recognition. In Artificial intelligence.

    Google Scholar 

  • Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Proceedings IEEE conference computer vision and pattern recognition (Vol. 2).

    Google Scholar 

  • Liu, C., Yuen, J., & Torralba, A. (2011a). Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2368–2382.

    Article  Google Scholar 

  • Liu, C., Yuen, J., & Torralba, A. (2011b). SIFT flow: dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 978–994.

    Article  Google Scholar 

  • Malisiewicz, T., & Efros, A. A. (2008). Recognition by association via learning per-exemplar distances. In Proceedings IEEE conference computer vision and pattern recognition (pp. 1–8).

    Google Scholar 

  • Malisiewicz, T., & Efros, A. A. (2011). Ensemble of exemplar-SVMs for object detection and beyond. In Proceedings IEEE international conference computer vision (pp. 89–96).

    Google Scholar 

  • Nowozin, S., Carsten, R., Bagon, S., Sharp, T., Yao, B., & Kohli, P. (2011). Decision tree fields. In Proceedings IEEE international conference computer vision (pp. 1668–1675).

    Google Scholar 

  • Oliva, A., & Torralba, A. (2006). Building the gist of a scene: the role of global image features in recognition. visual perception. Progress in Brain Research, 155, 23–36.

    Article  Google Scholar 

  • Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., & Belongie, S. (2007). Objects in context. In Proceedings IEEE international conference computer vision (pp. 1–8).

    Google Scholar 

  • Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In Proceedings IEEE international conference computer vision.

    Google Scholar 

  • Russell, B. C., Torralba, A., Liu, C., Fergus, R., & Freeman, W. T. (2007). Object recognition by scene alignment. In Neural information processing systems foundation.

    Google Scholar 

  • Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1–3), 157–173.

    Article  Google Scholar 

  • Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In Proceedings IEEE conference computer vision and pattern recognition.

    Google Scholar 

  • Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings European conference computer vision (pp. 1–14).

    Google Scholar 

  • Silberman, N., & Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In Proceedings IEEE international conference computer vision workshop.

    Google Scholar 

  • Socher, R., Lin, C. C. Y., Ng, A. Y., & Manning, C. D. (2011). Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the international conference on machine learning.

    Google Scholar 

  • Sturgess, P., Alahari, K., Ladicky, L., & Torr, P. H. S. (2009). Combining appearance and structure from motion features for road scene understanding. In British machine vision conference (pp. 1–11).

    Google Scholar 

  • Tighe, J., & Lazebnik, S. (2010). SuperParsing: scalable nonparametric image parsing with superpixels. In Proceedings European conference computer vision.

    Google Scholar 

  • Tighe, J., & Lazebnik, S. (2011). Understanding scenes on many levels. In Proceedings IEEE international conference computer vision (pp. 335–342).

    Google Scholar 

  • Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958–1970.

    Article  Google Scholar 

  • Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. In Proceedings IEEE conference computer vision and pattern recognition (pp. 3485–3492).

    Google Scholar 

  • Xiao, J., & Quan, L. (2009). Multiple view semantic segmentation for street view images. In Proceedings IEEE international conference computer vision.

    Google Scholar 

  • Xu, C., & Corso, J. J. (2012). Evaluation of super-voxel methods for early video processing. In Proceedings IEEE conference computer vision and pattern recognition.

    Google Scholar 

  • Zhang, C., Wang, L., & Yang, R. (2010). Semantic segmentation of urban scenes using dense depth maps. In Proceedings European conference computer vision (pp. 708–721).

    Google Scholar 

Download references

Acknowledgements

This research was supported in part by NSF grants IIS-0845629 and IIS-0916829, DARPA Computer Science Study Group, Microsoft Research Faculty Fellowship, and Xerox.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph Tighe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tighe, J., Lazebnik, S. Superparsing. Int J Comput Vis 101, 329–349 (2013). https://doi.org/10.1007/s11263-012-0574-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-012-0574-z

Keywords

Navigation