Skip to main content

Advertisement

Log in

Context, Computation, and Optimal ROC Performance in Hierarchical Models

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

It is widely recognized that human vision relies on contextual information, typically arising from each of many levels of analysis. Local gradient information, otherwise ambiguous, is seen as part of a smooth contour or sharp angle in the context of an object’s boundary or corner. A stroke or degraded letter, unreadable by itself, contributes to the perception of a familiar word in the context of the surrounding strokes and letters. The iconic Dalmatian dog stays invisible until a multitude of clues about body parts and posture, and figure and ground, are coherently integrated. Context is always based on knowledge about the composition of parts that make up a whole, as in the arrangement of strokes that make up a letter, the arrangement of body parts that make up an animal, or the poses and postures of individuals that make up a mob. From this point of view, the hierarchy of contextual information available to an observer derives from the compositional nature of the world being observed. We will formulate this combinatorial viewpoint in terms of probability distributions and examine the computational implications. Whereas optimal recognition performance in this formulation is NP-complete, we will give mathematical and experimental evidence that a properly orchestrated computational algorithm can achieve nearly optimal recognition within a feasible number of operations. We will interpret the notions of bottom-up and top-down processing as steps in the staging of one such orchestration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahuja, N., & Todorovic, S. (2008). Connected segmentation tree—a joint representation of region layout and hierarchy. In CVPR’08.

    Google Scholar 

  • Amit, Y., & Geman, D. (1998). A computational model for visual selection. Neural Computation, 11, 1691–1715.

    Article  Google Scholar 

  • Amit, Y., & Trouvé, A. (2007). Pop: Patchwork of parts models for object recognition. International Journal of Computer Vision 75(2).

  • Amit, Y., & Trouvé, A. (2010). The more you look the more you see: efficient resource allocation for curve tracking in noise images (Technical Report). University of Chicago, Statistics.

  • Bahadur, R. R., & Rao, R. R. (1960). On deviations of the sample mean. Annals of Mathematical Statistics, 31, 1015–1027.

    Article  MATH  MathSciNet  Google Scholar 

  • Barlow, H. (1994). What is the computational goal of the neocortex? In C. Koch, & J. Davis (Eds.), Large-scale neuronal theories of the brain (pp. 1–22). Cambridge: MIT Press.

    Google Scholar 

  • Bengio, Y., & LeCun, Y. (2007). Scaling learning algorithms towards ai. In L. Bottou, O. Chapelle, D. DeCoste, & J. Weston (Eds.), Large-scale kernel machines. Cambridge: MIT Press.

    Google Scholar 

  • Blanchard, G., & Geman, D. (2005). Hierarchical testing designs for pattern recognition. Annals of Statistics, 33, 1155–1202.

    Article  MATH  MathSciNet  Google Scholar 

  • Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In ECCV ’02: proceedings of the 7th european conference on computer vision-part II (pp. 109–124). Berlin: Springer.

    Google Scholar 

  • Burl, M. C., & Perona, P. (1998). Using hierarchical shape models to spot keywords in cursive handwriting data. In CVPR.

    Google Scholar 

  • Chang, L. B. (2010). Conditional modeling and conditional inference. PhD thesis, Brown University, Division of Applied Mathematics.

  • Chen, Y., Zhu, L., Lin, C., Yuille, A., & Zhang, H. (2007). Rapid inference on a novel and/or graph for object detection, segmentation and parsing. In NIPS.

    Google Scholar 

  • Epshtein, B., & Ullman, S. (2005). Feature hierarchies for object classification. In ICCV’05.

    Google Scholar 

  • Felzenszwalb, P. F., & McAllester, D. (2010). Object detection grammars. University of Chicago, Computer Science TR-2010-02.

  • Fidler, S., & Leonardis, A. (2007). Towards scalable representations of object categories: Learning a hierarchy of parts. In CVPR’07.

    Google Scholar 

  • Fleuret, F., & Geman, D. (2001). Coarse-to-fine face detection. International Journal of Computer Vision, 41, 85–107.

    Article  MATH  Google Scholar 

  • Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architecture: a critical analysis. Cognition, 28, 3–71.

    Article  Google Scholar 

  • Harrison, M. T. (2005). Discovering compositional structure. PhD thesis, Brown University, Division of Applied Mathematics.

  • Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In CVPR’06 (vol. 2, pp. 2145–2152). New York: IEEE Press.

    Google Scholar 

  • Kokkinos, I., Maragos, P., & Yuille, A. (2006). Bottom-up & top-down object detection using primal sketch features and graphical models. In CVPR’06.

    Google Scholar 

  • Ommer, B., & Buhmann, J. M. (2007). Learning the compositional nature of visual objects. In CVPR’07.

    Google Scholar 

  • Moreels, P., & Perona, P. (2008). A probabilistic cascade of detectors for individual object recognition. In ECCV.

    Google Scholar 

  • Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 411–426.

    Article  Google Scholar 

  • Shieber, S. (1992). Constraint-based grammar formalisms. Cambridge: MIT Press.

    Google Scholar 

  • Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2005). Learning hierarchical models of scenes, objects, and parts. In IEEE international conference on computer vision.

    Google Scholar 

  • Viola, P., & Jones, M. J. (2001). Robust real-time face detection. In Proc. ICCV01 (vol. II, p. 747).

    Google Scholar 

  • Warren, W. H. (2010). Direct perception. In E. Goldstein (Ed.), Encyclopedia of perception. Thousand Oaks: Sage.

    Google Scholar 

  • Wu, T. F., & Zhu, S. C. (2010). A numerical study of the bottom-up and top-down inference processes in and-or graphs. International Journal of Computer Vision. doi:10.1007/s11263-010-0346-6.

  • Zhang, W. (2009). Statistical inference and probabilistic modeling in compositional vision. PhD thesis, Brown University, Division of Applied Mathematics.

  • Zhu, S. C., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stuart Geman.

Additional information

Partially supported by the Office of Naval Research under N000140610749, and the National Science Foundation under ITR-0427223 and DMS-1007593.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, LB., Jin, Y., Zhang, W. et al. Context, Computation, and Optimal ROC Performance in Hierarchical Models. Int J Comput Vis 93, 117–140 (2011). https://doi.org/10.1007/s11263-010-0391-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-010-0391-1

Keywords

Navigation