Context, Computation, and Optimal ROC Performance in Hierarchical Models

Chang, Lo-Bin; Jin, Ya; Zhang, Wei; Borenstein, Eran; Geman, Stuart

doi:10.1007/s11263-010-0391-1

Context, Computation, and Optimal ROC Performance in Hierarchical Models

Published: 09 October 2010

Volume 93, pages 117–140, (2011)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Lo-Bin Chang¹,
Ya Jin¹,
Wei Zhang¹,
Eran Borenstein¹ &
…
Stuart Geman¹

369 Accesses
14 Citations
Explore all metrics

Abstract

It is widely recognized that human vision relies on contextual information, typically arising from each of many levels of analysis. Local gradient information, otherwise ambiguous, is seen as part of a smooth contour or sharp angle in the context of an object’s boundary or corner. A stroke or degraded letter, unreadable by itself, contributes to the perception of a familiar word in the context of the surrounding strokes and letters. The iconic Dalmatian dog stays invisible until a multitude of clues about body parts and posture, and figure and ground, are coherently integrated. Context is always based on knowledge about the composition of parts that make up a whole, as in the arrangement of strokes that make up a letter, the arrangement of body parts that make up an animal, or the poses and postures of individuals that make up a mob. From this point of view, the hierarchy of contextual information available to an observer derives from the compositional nature of the world being observed. We will formulate this combinatorial viewpoint in terms of probability distributions and examine the computational implications. Whereas optimal recognition performance in this formulation is NP-complete, we will give mathematical and experimental evidence that a properly orchestrated computational algorithm can achieve nearly optimal recognition within a feasible number of operations. We will interpret the notions of bottom-up and top-down processing as steps in the staging of one such orchestration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

No one knows what attention is

Article Open access 05 September 2019

References

Ahuja, N., & Todorovic, S. (2008). Connected segmentation tree—a joint representation of region layout and hierarchy. In CVPR’08.
Google Scholar
Amit, Y., & Geman, D. (1998). A computational model for visual selection. Neural Computation, 11, 1691–1715.
Article Google Scholar
Amit, Y., & Trouvé, A. (2007). Pop: Patchwork of parts models for object recognition. International Journal of Computer Vision 75(2).
Amit, Y., & Trouvé, A. (2010). The more you look the more you see: efficient resource allocation for curve tracking in noise images (Technical Report). University of Chicago, Statistics.
Bahadur, R. R., & Rao, R. R. (1960). On deviations of the sample mean. Annals of Mathematical Statistics, 31, 1015–1027.
Article MATH MathSciNet Google Scholar
Barlow, H. (1994). What is the computational goal of the neocortex? In C. Koch, & J. Davis (Eds.), Large-scale neuronal theories of the brain (pp. 1–22). Cambridge: MIT Press.
Google Scholar
Bengio, Y., & LeCun, Y. (2007). Scaling learning algorithms towards ai. In L. Bottou, O. Chapelle, D. DeCoste, & J. Weston (Eds.), Large-scale kernel machines. Cambridge: MIT Press.
Google Scholar
Blanchard, G., & Geman, D. (2005). Hierarchical testing designs for pattern recognition. Annals of Statistics, 33, 1155–1202.
Article MATH MathSciNet Google Scholar
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In ECCV ’02: proceedings of the 7th european conference on computer vision-part II (pp. 109–124). Berlin: Springer.
Google Scholar
Burl, M. C., & Perona, P. (1998). Using hierarchical shape models to spot keywords in cursive handwriting data. In CVPR.
Google Scholar
Chang, L. B. (2010). Conditional modeling and conditional inference. PhD thesis, Brown University, Division of Applied Mathematics.
Chen, Y., Zhu, L., Lin, C., Yuille, A., & Zhang, H. (2007). Rapid inference on a novel and/or graph for object detection, segmentation and parsing. In NIPS.
Google Scholar
Epshtein, B., & Ullman, S. (2005). Feature hierarchies for object classification. In ICCV’05.
Google Scholar
Felzenszwalb, P. F., & McAllester, D. (2010). Object detection grammars. University of Chicago, Computer Science TR-2010-02.
Fidler, S., & Leonardis, A. (2007). Towards scalable representations of object categories: Learning a hierarchy of parts. In CVPR’07.
Google Scholar
Fleuret, F., & Geman, D. (2001). Coarse-to-fine face detection. International Journal of Computer Vision, 41, 85–107.
Article MATH Google Scholar
Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architecture: a critical analysis. Cognition, 28, 3–71.
Article Google Scholar
Harrison, M. T. (2005). Discovering compositional structure. PhD thesis, Brown University, Division of Applied Mathematics.
Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In CVPR’06 (vol. 2, pp. 2145–2152). New York: IEEE Press.
Google Scholar
Kokkinos, I., Maragos, P., & Yuille, A. (2006). Bottom-up & top-down object detection using primal sketch features and graphical models. In CVPR’06.
Google Scholar
Ommer, B., & Buhmann, J. M. (2007). Learning the compositional nature of visual objects. In CVPR’07.
Google Scholar
Moreels, P., & Perona, P. (2008). A probabilistic cascade of detectors for individual object recognition. In ECCV.
Google Scholar
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 411–426.
Article Google Scholar
Shieber, S. (1992). Constraint-based grammar formalisms. Cambridge: MIT Press.
Google Scholar
Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2005). Learning hierarchical models of scenes, objects, and parts. In IEEE international conference on computer vision.
Google Scholar
Viola, P., & Jones, M. J. (2001). Robust real-time face detection. In Proc. ICCV01 (vol. II, p. 747).
Google Scholar
Warren, W. H. (2010). Direct perception. In E. Goldstein (Ed.), Encyclopedia of perception. Thousand Oaks: Sage.
Google Scholar
Wu, T. F., & Zhu, S. C. (2010). A numerical study of the bottom-up and top-down inference processes in and-or graphs. International Journal of Computer Vision. doi:10.1007/s11263-010-0346-6.
Zhang, W. (2009). Statistical inference and probabilistic modeling in compositional vision. PhD thesis, Brown University, Division of Applied Mathematics.
Zhu, S. C., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Division of Applied Mathematics, Brown University, Providence, RI, USA
Lo-Bin Chang, Ya Jin, Wei Zhang, Eran Borenstein & Stuart Geman

Authors

Lo-Bin Chang
View author publications
You can also search for this author in PubMed Google Scholar
Ya Jin
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Eran Borenstein
View author publications
You can also search for this author in PubMed Google Scholar
Stuart Geman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stuart Geman.

Additional information

Partially supported by the Office of Naval Research under N000140610749, and the National Science Foundation under ITR-0427223 and DMS-1007593.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, LB., Jin, Y., Zhang, W. et al. Context, Computation, and Optimal ROC Performance in Hierarchical Models. Int J Comput Vis 93, 117–140 (2011). https://doi.org/10.1007/s11263-010-0391-1

Download citation

Received: 11 June 2010
Accepted: 21 September 2010
Published: 09 October 2010
Issue Date: June 2011
DOI: https://doi.org/10.1007/s11263-010-0391-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context, Computation, and Optimal ROC Performance in Hierarchical Models

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

No one knows what attention is

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Context, Computation, and Optimal ROC Performance in Hierarchical Models

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Attention mechanisms in computer vision: A survey

No one knows what attention is

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation