Learning Spatial Context: Using Stuff to Find Things

  • Geremy Heitz
  • Daphne Koller
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5302)


The sliding window approach of detecting rigid objects (such as cars) is predicated on the belief that the object can be identified from the appearance in a small region around the object. Other types of objects of amorphous spatial extent (e.g., trees, sky), however, are more naturally classified based on texture or color. In this paper, we seek to combine recognition of these two types of objects into a system that leverages “context” toward improving detection. In particular, we cluster image regions based on their ability to serve as context for the detection of objects. Rather than providing an explicit training set with region labels, our method automatically groups regions based on both their appearance and their relationships to the detections in the image. We show that our things and stuff (TAS) context model produces meaningful clusters that are readily interpretable, and helps improve our detection ability over state-of-the-art detectors. We also present a method for learning the active set of relationships for a particular dataset. We present results on object detection in images from the PASCAL VOC 2005/2006 datasets and on the task of overhead car detection in satellite images, demonstrating significant improvements over state-of-the-art detectors.


  1. 1.
    Torralba, A.: Contextual priming for object detection. IJCV 53(2) (2003)Google Scholar
  2. 2.
    Viola, P., Jones, M.: Robust real-time face detection. In: ICCV (2001)Google Scholar
  3. 3.
    Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Forsyth, D.A., Malik, J., Fleck, M.M., Greenspan, H., Leung, T.K., Belongie, S., Carson, C., Bregler, C.: Finding pictures of objects in large collections of images. In: Object Representation in Computer Vision (1996)Google Scholar
  5. 5.
    Murphy, K., Torralba, A., Freeman, W.: Using the forest to see the tree: a graphical model relating features, objects and the scenes. In: NIPS (2003)Google Scholar
  6. 6.
    Singhal, A., Luo, J., Zhu, W.: Probabilistic spatial context models for scene content understanding. In: CVPR (2003)Google Scholar
  7. 7.
    Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: ICCV (2007)Google Scholar
  8. 8.
    Friedman, N.: Learning belief networks in the presence of missing values and hidden variables. In: ICML (1997)Google Scholar
  9. 9.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  10. 10.
    Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. (2007)Google Scholar
  11. 11.
    Torralba, A., Murphy, K., Freeman, W., Rubin, M.: Context-based vision system for place and object recognition. In: ICCV (2003)Google Scholar
  12. 12.
    Wolf, L., Bileschi, S.: A critical view of context. IJCV 69(2) (2006)Google Scholar
  13. 13.
    Fink, M., Perona, P.: Mutual boosting for contextual inference. In: NIPS (2003)Google Scholar
  14. 14.
    Kumar, S., Hebert, M.: A hierarchical field framework for unified context-based classification. In: ICCV (2005)Google Scholar
  15. 15.
    Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: CVPR (2006)Google Scholar
  17. 17.
    Ren, X., Malik, J.: Learning a classification model for segmentation. In: ICCV (2003)Google Scholar
  18. 18.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1) (1977)Google Scholar
  19. 19.
    Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images (1987)Google Scholar
  20. 20.
    Everingham, M.: The 2005 pascal visual object classes challenge. In: MLCW (2005)Google Scholar
  21. 21.
    Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. JMLR 3 (2003)Google Scholar
  22. 22.
    Saxena, A., Sun, M., Ng, A.Y.: Learning 3-d scene structure from a single still image. In: CVPR (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Geremy Heitz
    • 1
  • Daphne Koller
    • 1
  1. 1.Department of Computer ScienceStanford UniversityUSA

Personalised recommendations