International Journal of Computer Vision

, Volume 75, Issue 1, pp 151–172 | Cite as

Recovering Surface Layout from an Image

  • Derek HoiemEmail author
  • Alexei A. Efros
  • Martial Hebert


Humans have an amazing ability to instantly grasp the overall 3D structure of a scene—ground orientation, relative positions of major landmarks, etc.—even from a single image. This ability is completely missing in most popular recognition algorithms, which pretend that the world is flat and/or view it through a patch-sized peephole. Yet it seems very likely that having a grasp of this “surface layout” of a scene should be of great assistance for many tasks, including recognition, navigation, and novel view synthesis.

In this paper, we take the first step towards constructing the surface layout, a labeling of the image intogeometric classes. Our main insight is to learn appearance-based models of these geometric classes, which coarsely describe the 3D scene orientation of each image region. Our multiple segmentation framework provides robust spatial support, allowing a wide variety of cues (e.g., color, texture, and perspective) to contribute to the confidence in each geometric label. In experiments on a large set of outdoor images, we evaluate the impact of the individual cues and design choices in our algorithm. We further demonstrate the applicability of our method to indoor images, describe potential applications, and discuss extensions to a more complete notion of surface layout.


surface layout spatial layout geometric context scene understanding context object detection model-driven segmentation image understanding multiple segmentations object recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ahuja, N. 1996. A transform for multiscale image segmentation by integrated edge and region detection. PAMI, 18(12).Google Scholar
  2. Arbelaez, P. 2006. Boundary extraction in natural images using ultrametric contour maps. In Proc. CVPRW.Google Scholar
  3. Barbu, A. and Zhu, S.-C. 2005. Generalizing swendsen-wang to sampling arbitrary posterior probabilities. PAMI, 27(8):1239–1253.Google Scholar
  4. Barrow, H. and Tenenbaum, J. 1978. Recovering intrinsic scene characteristics from images. In Computer Vision Systems.Google Scholar
  5. Biederman, I. 1981. On the semantics of a glance at a scene. In Kubovy, M. and Pomerantz, J.R., (Eds), Perceptual Organization, chapter 8. Lawrence Erlbaum.Google Scholar
  6. Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. PAMI, 23(11):1222–1239.Google Scholar
  7. Brooks, R., Greiner, R., and Binford, T. 1979. Model-based three-dimensional interpretation of two-dimensional images. In Proc. Int. Joint Conf. on Art. Intell.Google Scholar
  8. Collins, M., Schapire, R., and Singer, Y. 2002. Logistic regression, adaboost and bregman distances. Machine Learning, 48(1–3).Google Scholar
  9. Criminisi, A., Reid, I., and Zisserman, A. 2000. Single view metrology. IJCV, 40(2).Google Scholar
  10. Delage, E., Lee, H., and Ng, A.Y. 2006. A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image. In Proc. CVPR.Google Scholar
  11. en Guo, C., Zhu, S.-C., and Wu, Y.N. 2003. Towards a mathematical theory of primal sketch and sketchability. In Proc. ICCV.Google Scholar
  12. Everingham, M.R., Thomas, B.T., and Troscianko, T. 1999. Head-mounted mobility aid for low vision using scene classification techniques. Int. J. of Virt. Reality, 3(4).Google Scholar
  13. Felzenszwalb, P. and Huttenlocher, D. 2004. Efficient graph-based image segmentation. IJCV, 59(2).Google Scholar
  14. Friedman, J., Hastie, T., and Tibshirani, R. 2000. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2).Google Scholar
  15. Gibson, J. 1950. The Perception of the Visual World. Houghton Mifflin.Google Scholar
  16. Guzman-Arenas, A. 1968. Computer recognition of three-dimensional objects in a visual scene. In MIT AI-TR.Google Scholar
  17. Han, F. and Zhu, S.-C. 2003. Bayesian reconstruction of 3d shapes and scenes from a single image. In Int. Work. on Higher-Level Know. in 3D Modeling and Motion Anal.Google Scholar
  18. Han, F. and Zhu, S.-C. 2005. Bottom-up/top-down image parsing by attribute graph grammar. In Proc. ICCV.Google Scholar
  19. Hanson, A. and Riseman, E. 1978. VISIONS: A computer system for interpreting scenes. In Computer Vision Systems.Google Scholar
  20. Hartley, R.I. and Zisserman, A. 2004. Multiple View Geometry in Computer Vision, 2nd edition. Cambridge University Press.Google Scholar
  21. Hoiem, D., Efros, A.A., and Hebert, M. 2005. Automatic photo pop-up. In ACM SIGGRAPH.Google Scholar
  22. Hoiem, D., Efros, A.A., and Hebert, M. 2005. Geometric context from a single image. In Proc. ICCV.Google Scholar
  23. Hoiem D., Efros, A.A., and Hebert, M. 2006. Putting objects in perspective. In Proc. CVPR.Google Scholar
  24. Koenderink, J.J. 1998. Pictorial relief. Phil. Trans. of the Roy. Soc., pp. 1071–1086.Google Scholar
  25. Koenderink, J.J., Doorn, A.J.V., and Kappers, A.M.L. 1996. Pictorial surface attitude and local depth comparisons. Perception and Psychophysics, 58(2):163–173.Google Scholar
  26. Konishi, S. and Yuille, A. 2000. Statistical cues for domain specific image segmentation with performance analysis. In Proc. CVPR.Google Scholar
  27. Kosecka, J. and Zhang, W. 2002. Video compass. In Proc. ECCV. Springer-Verlag.Google Scholar
  28. Lafferty, J.D., McCallum, A., and Pereira, F.C.N. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. ICML. Morgan Kaufmann Publishers Inc.Google Scholar
  29. Leung, T. and Malik, J. 2001. Representing and recognizing the visual appearance of materials using three-dimensional textons. IJCV, 43(1):29–44.zbMATHCrossRefGoogle Scholar
  30. Li, Y., Sun, J., Tang, C.-K., and Shum, H.-Y. 2004. Lazy snapping. ACM Trans. on Graphics, 23(3):303–308.CrossRefGoogle Scholar
  31. Liebowitz, D., Criminisi, A., and Zisserman, A. 1999. Creating architectural models from images. In Proc. EuroGraphics, vol. 18.Google Scholar
  32. Marr, D. 1982. Vision. Freeman, San Francisco.Google Scholar
  33. Murphy, K., Torralba, A., and Freeman, W.T. 2003. Graphical model for recognizing scenes and objects. In Proc. NIPS.Google Scholar
  34. Nabbe, B., Hoiem D., Efros, A.A., and Hebert M. 2006. Opportunistic use of vision to push back the path-planning horizon. In Proc. IROS.Google Scholar
  35. Ohta, Y. 1985. Knowledge-Based Interpretation of Outdoor Natural Color Scenes. Pitman.Google Scholar
  36. Ohta, Y., Kanade, T., and Sakai, T. 1978. An analysis system for scenes containing objects with substructures. In IJCPR, pp. 752–754.Google Scholar
  37. Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42(3):145–175.zbMATHCrossRefGoogle Scholar
  38. Pollefeys, M., Koch, R., and Gool, L.J.V. 1998. Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters. In Proc. ICCV.Google Scholar
  39. Rabinovich, A., Belongie, S., Lange, T., and Buhmann, J.M. 2006. Model order selection and cue combination for image segmentation. In Proc. CVPR.Google Scholar
  40. Ren, X. and Malik, J. 2003. Learning a classification model for segmentation. In Proc. ICCV.Google Scholar
  41. Roberts, L. 1965. Machine perception of 3-d solids, pp. 159–197.Google Scholar
  42. Russell, B.C., Efros, A.A., Sivic, J., Freeman, W.T., and Zisserman, A. 2006. Using multiple segmentations to discover objects and their extent in image collections. In Proc. CVPR.Google Scholar
  43. Saxena, A., Chung, S., and Ng, A.Y. 2005. Learning depth from single monocular images. In Proc. NIPS.Google Scholar
  44. Schapire, R.E. and Singer, Y. 1999. Improved boosting using confidence-rated predictions. Machine Learning, 37(3):297–336.zbMATHCrossRefGoogle Scholar
  45. Sharon, E., Brandt, A., and Basri, R. 2000. Fast multiscale image segmentation. In Proc. CVPR.Google Scholar
  46. Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. PAMI, 22(8).Google Scholar
  47. Singhal, A., Luo, J., and Zhu, W. 2003. Probabilistic spatial context models for scene content understanding. In Proc. CVPR.Google Scholar
  48. Sudderth, E., Torralba, A., Freeman, W.T., and Wilsky, A. 2005. Learning hierarchical models of scenes, objects, and parts. In Proc. ICCV.Google Scholar
  49. Sudderth, E., Torralba, A., Freeman, W.T., and Wilsky, A. 2006. Depth from familiar objects: A hierarchical model for 3d scenes. In Proc. CVPR.Google Scholar
  50. Tao, H., Sawhney, H.S., and Kumar, R. 2001. A global matching framework for stereo computation. In Proc. ICCV, pp. 532–539.Google Scholar
  51. Tenenbaum, J. and Barrow, H. 1977. Experiments in interpretation guided segmentation. 8(3):241–274.Google Scholar
  52. Torralba, A. and Oliva, A. 2002. Depth estimation from image structure. PAMI, 24(9).Google Scholar
  53. Tu, Z., Chen, X., Yuille, A.L., and Zhu, S.-C. 2005. Image parsing: Unifying segmentation, detection, and recognition. IJCV, 63(2):113–140.CrossRefGoogle Scholar
  54. Tu, Z. and Zhu, S.-C. 2002. Image segmentation by data-driven markov chain monte carlo. PAMI, pp. 657–673.Google Scholar
  55. Warren, R.M. and Warren, R.P. 1968. Helmholtz on Perception: Its Physiology and Development. John Wiley & Sons.Google Scholar
  56. Yakimovsky, Y. and Feldman, J.A. 1973. A semantics-based decision theory region analyzer. In Proc. Int. Joint Conf. on Art. Intell., pp. 580–588.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations