People Watching: Human Actions as a Cue for Single View Geometry

  • David F. Fouhey
  • Vincent Delaitre
  • Abhinav Gupta
  • Alexei A. Efros
  • Ivan Laptev
  • Josef Sivic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7576)


We present an approach which exploits the coupling between human actions and scene geometry. We investigate the use of human pose as a cue for single-view 3D scene understanding. Our method builds upon recent advances in still-image pose estimation to extract functional and geometric constraints about the scene. These constraints are then used to improve state-of-the-art single-view 3D scene understanding approaches. The proposed method is validated on a collection of monocular time-lapse sequences collected from YouTube and a dataset of still images of indoor scenes. We demonstrate that observing people performing different actions can significantly improve estimates of 3D scene geometry.


Single Image Functional Region Indoor Scene Scene Geometry People Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Gibson, J.: The ecological approach to visual perception. Houghton Mifflin, Boston (1979)Google Scholar
  2. 2.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: CVPR (2009)Google Scholar
  3. 3.
    Yang, Y., Ramanan, D.: Articulated pose estimation using flexible mixtures of parts. In: CVPR (2011)Google Scholar
  4. 4.
    Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: CVPR (2011)Google Scholar
  5. 5.
    Taylor, C.J.: Reconstruction of articulated objects from point correspondences in a single image. In: CVPR (2000)Google Scholar
  6. 6.
    Hedau, V., Hoiem, D., Forsyth, D.: Recovering the spatial layout of cluttered rooms. In: ICCV (2009)Google Scholar
  7. 7.
    Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: IJCV (2008)Google Scholar
  8. 8.
    Hedau, V., Hoiem, D., Forsyth, D.: Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 224–237. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Yu, S.X., Zhang, H., Malik, J.: Inferring spatial layout from a single image via depth-ordered grouping. In: The 6th IEEE Computer Society Workshop on Perceptual Organization in Computer Vision (2008)Google Scholar
  10. 10.
    Lee, D., Gupta, A., Hebert, M., Kanade, T.: Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: NIPS (2010)Google Scholar
  11. 11.
    Hoiem, D., Efros, A., Hebert, M.: Geometric context from a single image. In: ICCV (2005)Google Scholar
  12. 12.
    Wang, H., Gould, S., Koller, D.: Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 435–449. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Gupta, A., Efros, A., Hebert, M.: Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 482–496. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  14. 14.
    Barinova, O., Lempitsky, V., Tretyak, E., Kohli, P.: Geometric Image Parsing in Man-Made Environments. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 57–70. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Del Pero, L., Guan, J., Brau, E., Schlecht, J., Barnard, K.: Sampling bedrooms. In: CVPR (2011)Google Scholar
  16. 16.
    Payet, N., Todorovic, S.: Scene shape from texture of objects. In: CVPR (2011)Google Scholar
  17. 17.
    Schwing, A., Hazan, T., Pollefeys, M., Urtasun, R.: Efficient structured prediction for 3D indoor scene understanding. In: CVPR (2012)Google Scholar
  18. 18.
    Del Pero, L., Bowdish, J., Fried, D., Kermgard, B., Hartley, E.L., Barnard, K.: Bayesian geometric modeling of indoor scenes. In: CVPR (2012)Google Scholar
  19. 19.
    Gupta, A., Davis, L.S.: Objects in action: An approach for combining action understanding and object perception. In: CVPR (2007)Google Scholar
  20. 20.
    Turek, M., Hoogs, A., Collins, R.: Unsupervised Learning of Functional Categories in Video Scenes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 664–677. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. 21.
    Delaitre, V., Sivic, J., Laptev, I.: Learning person-object interactions for action recognition in still images. In: NIPS (2011)Google Scholar
  22. 22.
    Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. PAMI (2011)Google Scholar
  23. 23.
    Gall, J., Fossati, A., van Gool, L.: Functional categorization of objects using real-time markerless motion capture. In: CVPR (2011)Google Scholar
  24. 24.
    Kjellstrom, H., Romero, J., Martinez, D., Kragic, D.: Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 336–349. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  25. 25.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for static human-object interactions. In: SMiCV, CVPR (2010)Google Scholar
  26. 26.
    Yao, B., Khosla, A., Fei-Fei, L.: Classifying actions and measuring action similarity by modeling the mutual context of objects and human poses. In: Proc. ICML (2011)Google Scholar
  27. 27.
    Gupta, A., Chen, T., Chen, F., Kimber, D., Davis, L.: Context and observation driven latent variable model for human pose estimation. In: CVPR (2008)Google Scholar
  28. 28.
    Grabner, H., Gall, J., van Gool, L.: What makes a chair a chair? In: CVPR (2011)Google Scholar
  29. 29.
    Gupta, A., Satkin, S., Efros, A., Hebert, M.: From 3d scene geometry to human workspace. In: CVPR (2011)Google Scholar
  30. 30.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: ICCV (2009)Google Scholar
  31. 31.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  32. 32.
    Guan, L., Franco, J.S., Pollefeys, M.: 3d occlusion inference from silhouette cues. In: CVPR (2007)Google Scholar
  33. 33.
    Krahnstoever, N., Mendonca, P.R.S.: Bayesian autocalibration for surveillance. In: CVPR (2005)Google Scholar
  34. 34.
    Rother, D., Patwardhan, K., Sapiro, G.: What can casual walkers tell us about the 3D scene. In: CVPR (2007)Google Scholar
  35. 35.
    Schodl, A., Essa, I.: Depth layers from occlusions. In: CVPR (2001)Google Scholar
  36. 36.
    Coughlan, J., Yuille, A.: The Manhattan world assumption: Regularities in scene statistics which enable bayesian inference. In: NIPS (2000)Google Scholar
  37. 37.
    Lee, D., Hebert, M., Kanade, T.: Geometric reasoning for single image structure recovery. In: ICCV (2009)Google Scholar
  38. 38.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  39. 39.
    Hedau, V., Hoiem, D., Forsyth, D.: Recovering free space of indoor scenes from a single image. In: CVPR (2012)Google Scholar
  40. 40.
    Delaitre, V., Fouhey, D.F., Laptev, I., Sivic, J., Gupta, A., Efros, A.A.: Scene Semantics from Long-Term Observation of People. In: Fitzgibbon, A., Lazebnik, S., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 284–298. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • David F. Fouhey
    • 1
  • Vincent Delaitre
    • 2
  • Abhinav Gupta
    • 1
  • Alexei A. Efros
    • 1
    • 2
  • Ivan Laptev
    • 2
  • Josef Sivic
    • 2
  1. 1.Carnegie Mellon UniversityUSA
  2. 2.INRIA/École Normale SupérieureParisFrance

Personalised recommendations