Indoor Scene Understanding with Geometric and Semantic Contexts

Choi, Wongun; Chao, Yu-Wei; Pantofaru, Caroline; Savarese, Silvio

doi:10.1007/s11263-014-0779-4

Indoor Scene Understanding with Geometric and Semantic Contexts

Published: 12 November 2014

Volume 112, pages 204–220, (2015)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Wongun Choi¹,
Yu-Wei Chao²,
Caroline Pantofaru³ &
…
Silvio Savarese⁴

1810 Accesses
32 Citations
3 Altmetric
Explore all metrics

Abstract

Truly understanding a scene involves integrating information at multiple levels as well as studying the interactions between scene elements. Individual object detectors, layout estimators and scene classifiers are powerful but ultimately confounded by complicated real-world scenes with high variability, different viewpoints and occlusions. We propose a method that can automatically learn the interactions among scene elements and apply them to the holistic understanding of indoor scenes from a single image. This interpretation is performed within a hierarchical interaction model which describes an image by a parse graph, thereby fusing together object detection, layout estimation and scene classification. At the root of the parse graph is the scene type and layout while the leaves are the individual detections of objects. In between is the core of the system, our 3D Geometric Phrases (3DGP). We conduct extensive experimental evaluations on single image 3D scene understanding using both 2D and 3D metrics. The results demonstrate that our model with 3DGPs can provide robust estimation of scene type, 3D space, and 3D objects by leveraging the contextual relationships among the visual elements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

This representation ensures that all observation features associated with a detection have values distributed from negative to positive, make graphs with different numbers of objects are comparable.
Although the view-dependent biases are not view-point invariant, there are still only a few parameters (8 views per 3DGP).
The dataset is available at http://cvgl.stanford.edu/projects/3dgp/.
The method in Schwing and Urtasun (2012) produces better layout estimation results, however the code is not publicly available. So we use Hedau et al. (2009) as the baseline.

References

Bao, S., Sun, M., & Savarese, S. (2010). Toward coherent object detection and scene layout understanding. In Proceedings of the conference on Computer Vision and Pattern Recognition.
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2, 27:1–27:27.
Article Google Scholar
Chao, Y.W., Choi, W., Pantofaru, C., & Savarese, S. (2013). Layout estimation of highly cluttered indoor scenes using geometric and semantic cues. In Proceedings of the International Conference on Image Analysis and Processing.
Choi, W., Chao, Y., Pantofaru, C., & Savarese, S. (2013) Understanding indoor scenes using 3D geometric phrases. In CVPR.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR.
Desai, C., Ramanan, D., & Fowlkes, C. C. (2011). Discriminative models for multi-class object layout. IJCV.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal Visual Object Classes (VOC) challenge. IJCV.
Fei-Fei, L., & Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. CVPR pp. 524–531.
Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. PAMI, 32(9), 1627–1645.
Article Google Scholar
Fouhey, D. F., Delaitre, V., Gupta, A., Efros, A. A., Laptev, I., & Sivic, J. (2012). People watching: Human actions as a cue for single-view geometry. In ECCV.
Geiger, A., Wojek, C., & Urtasun, R. (2011). Joint 3D estimation of objects and scene layout. In NIPS.
Gupta, A., Efros, A., & Hebert, M. (2010). Blocks world revisited: Image understanding using qualitative geometry and mechanics. In ECCV.
Hartley, R. I., & Zisserman, A. (2004). Multiple View Geometry in Computer Vision (2nd ed.). Cambridge: Cambridge University Press, ISBN: 0521540518.
Hedau, V., Hoiem, D., & Forsyth, D. (2009). Recovering the spatial layout of cluttered room. In ICCV (2009)
Hedau, V., Hoiem, D., & Forsyth, D. (2010). Thinking inside the box: Using appearance models and context based on room geometry. In ECCV.
Hedau, V., Hoiem, D., & Forsyth, D. (2012). Recovering free space of indoor scenes from a single image. In CVPR.
Hoiem, D., Efros, A. A., & Hebert, M. (2007). Recovering surface layout from an image. IJCV.
Hoiem, D., Efros, A. A., & Hebert, M. (2008). Putting objects in perspective. IJCV.
Lagarias, J. C., Reeds, J. A., Wright, M. H., & Wright, P. E. (1998). Convergence properties of the nelder-mead simplex method in low dimensions. SIAM Journal on Optimization, 9(1), 148–158.
Article MathSciNet Google Scholar
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.
Lee, D., Gupta, A., Hebert, M., & Kanade, T. (2010). Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In NIPS.
Lee, D., Hebert, M., & Kanade, T. (2009). Geometric reasoning for single image structure recovery. In CVPR.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Statistical Learning in Computer Vision, ECCV.
Li, C., Parikh, D., & Chen, T. (2012). Automatic discovery of groups of objects for scene understanding. In CVPR.
Li, L. J., Su, H., Xing, E. P., & Fei-Fei, L. (2010). Object bank: A high-level image representation for scene classification & semantic feature sparsification. In NIPS.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110. doi:10.1023/B:VISI.0000029664.99615.94.
Article Google Scholar
Pandey, M., & Lazebnik, S. (2011). Scene recognition and weakly supervised object localization with deformable part-based models. In ICCV.
Pero, L. D., Bowdish, J., Fried, D., Kermgard, B., Hartley, E. L., & Barnard, K. (2012). Bayesian geometric modeling of indoor scenes. In CVPR.
Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In CVPR.
Rother, C. (2002). A new approach for vanishing point detection in architectural environments. Journal Image and Vision Computing, 20, 647–656.
Sadeghi, A., & Farhadi, A. (2011). Recognition using visual phrases. In CVPR.
Satkin, S., Lin, J., & Hebert, M. (2012). Data-driven scene understanding from 3D models. In BMVC.
Schwing, A. G., & Urtasun, R. (2012). Efficient exact inference for 3D indoor scene understanding. In ECCV.
Wang, H., Gould, S., & Koller, D. (2010). Discriminative learning with latent variables for cluttered indoor scene understanding. In ECCV.
Wang, Y., & Mori, G. (2011). Hidden part models for human action recognition: Probabilistic versus max margin. In PAMI.
Xiang, Y., & Savarese, S. (2012). Estimating the aspect layout of object categories. In CVPR.
Zhao, Y., & Zhu, S. C. (2011). Image parsing via stochastic scene grammar. In NIPS.

Download references

Author information

Authors and Affiliations

NEC Laboratories America, Cupertino, CA, USA
Wongun Choi
University of Michigan, Ann Arbor, MI, USA
Yu-Wei Chao
Google, Inc, Mountain View, CA, USA
Caroline Pantofaru
Stanford University, Stanford, CA, USA
Silvio Savarese

Authors

Wongun Choi
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Wei Chao
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Pantofaru
View author publications
You can also search for this author in PubMed Google Scholar
Silvio Savarese
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wongun Choi.

Additional information

Communicated by Derek Hoiem, James Hays, Jianxiong Xiao and Aditya Khosla.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, W., Chao, YW., Pantofaru, C. et al. Indoor Scene Understanding with Geometric and Semantic Contexts. Int J Comput Vis 112, 204–220 (2015). https://doi.org/10.1007/s11263-014-0779-4

Download citation

Received: 31 January 2014
Accepted: 14 October 2014
Published: 12 November 2014
Issue Date: April 2015
DOI: https://doi.org/10.1007/s11263-014-0779-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Indoor Scene Understanding with Geometric and Semantic Contexts

Abstract

Access this article

Similar content being viewed by others

Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images

Joint 3D Object and Layout Inference from a Single RGB-D Image

Geometry Driven Semantic Labeling of Indoor Scenes

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Indoor Scene Understanding with Geometric and Semantic Contexts

Abstract

Access this article

Similar content being viewed by others

Integrating Geometrical Context for Semantic Labeling of Indoor Scenes using RGBD Images

Joint 3D Object and Layout Inference from a Single RGB-D Image

Geometry Driven Semantic Labeling of Indoor Scenes

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation