Image Parsing: Unifying Segmentation, Detection, and Recognition

Tu, Zhuowen; Chen, Xiangrong; Yuille, Alan L.; Zhu, Song-Chun

doi:10.1007/s11263-005-6642-x

Image Parsing: Unifying Segmentation, Detection, and Recognition

Published: 01 February 2005

Volume 63, pages 113–140, (2005)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Zhuowen Tu¹,
Xiangrong Chen¹,
Alan L. Yuille² &
…
Song-Chun Zhu³

2208 Accesses
308 Citations
9 Altmetric
Explore all metrics

Abstract

In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a “parsing graph”, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches—generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns—generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002. IEEE Trans. PAMI, 24(5):657–673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Barbu, A. and Zhu, S.C. 2003. Graph partition by Swendsen-Wang cut. In Proc. of Int’l Conf. on Computer Vision, Nice, France.
Google Scholar
Barbu, A. and Zhu, S.C. 2004. Multi-grid and multi-level Swendsen-Wang cuts for hierarchic graph partition. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, Washington DC.
Barnard, K. and Forsyth, D.A. 2001. Learning the semantics of words and pictures, ICCV.
Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts, IEEE Trans, on Pattern Analysis and Machine Intelligence, 24:509–522.
Google Scholar
Bienenstock, E., Geman, S., and Potter, D. 1997. Compositionality, MDL Priors, and Object Recognition, NIPS.
Blanchard, G. and Geman, D. 2003. Hierarchical testing designs for pattern recognition. Technical report, Math. Science, Johns Hopkins University.
Bremaud, P. 1999. Markov Chains: Gibbs Fields, Monte Carlo Simulation and Queues. Springer. (Chapter 6).
Bowyer, K.W. Kranenburg, C., and Dougherty, S. 2001. Edge detector evaluation using empirical ROC curves, Computer Vision and Image Understanding, 84(1):77–103.
Google Scholar
Canny, J. 1986. A computational approach to edge detection. IEEE Trans, on PAMI, 8(6).
Chen, X. and Yuille, A.L. 2004. AdaBoost learning for detecting and reading text in city scenes. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition. Washington DC.
Cootes, T.F., Edwards, G.J., and Taylor, C.J. 2001. Active appearance models. IEEE Trans, on PAMI, 6(23).
Comaniciu, D. and Meer, P. 1999. Mean shift analysis and applications. Proc. of ICCV.
Cover, T.M. and Thomas, J.A. 1991Elements of Information Theory. John Wiley and Sons, Inc: NY, pp. 33–36.
Google Scholar
Dayan, P., Hinton, G., Neal, R., and Zemel, R. 1995. The Helmholtz Machine. Neural Computation, 7:889–904.
Google Scholar
Diaconis, P. and Hanlon, P. 1992. Eigenanalysis for some examples of the Metropolis algorithms. Contemporary Mathematics, 138:99–117.
Google Scholar
Drucker, H., Schapire, R., and Simard, P. 1993. Boosting performance in neural networks. Intl J. Pattern Rec. and Artificial Intelligence, 7(4).
Fowlkes, C. and Malik, J. 2004. How Much Does Globalization Help Segmentation? CVPR.
Freund, Y. and Schapire, R. 1996. Experiments with a new boosting algorithm. In Proc. of 13th Int’l Conference on Machine Learning.
Friedman, J., Hastie, T. and Tibshirani, R. 1998. Additive logistic regression: A statistical view of boosting, Dept. of Statistics, Stanford Univ. Technical Report.
Geman, S. and Huang, C.R. 1986. Diffusion for global optimization, SIAM J. on Control and Optimization, 24(5).
Green, P.J. 1995. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika, 82(4):711–732.
MATH Google Scholar
Hallinan, P., Gordon, G., Yuille, A., Giblin, P., and Mumford, D. 1999. Two and Three Dimensional Patterns of the Face, A.K. Peters.
Han, F. and Zhu, S.C. 2003. Bayesian reconstruction of 3D shapes and scenes from a single image. In Proc. Int’l Workshop on High Level Knowledge in 3D Modeling and Motion, Nice France.
Hastings, W.K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57:97–109.
Google Scholar
Klein, D. and Manning, C.D. 2002. A generative constituent-context model for improved grammar induction. Proc of 40th Annual Meeting of the Assoc. for Computational Linguistics.
Konishi, S., Coughlan, J.M., Yuille, A.L., and Zhu, S.C. 2003. Statistical edge detection: learning and evaluating edge cues. IEEE Trans, on Pattern Analysis and Machine Intelligence, 25(1):57–74.
Google Scholar
Kumar, S. and Hebert, M. 2003. Discriminative random fields. In Proc. of Int’l Conf. on Computer Vision, Nice, France.
Google Scholar
Li, F.F., VanRullen, R., Koch, C., and Perona, P. 2003. Rapid natural scene categorization in the near absence of attention. In Proc. of National Academy of Sciences, 99(14).
Liu, J.S. 2001. Monte Carlo Strategies in Scientific Computing. Springer.
Lowe, L.D. 2003. Distinctive image features from scale-invariant keypoints, IJCV.
Maciuca, R. and Zhu, S.C. 2003. How do heuristics expedite markov chain search. In Proc. of 3rd Workshop on Statistical and Computational Theory for Vision.
Malik, J., Belongie, S., Leung, T., and Shi, J. 2001. Contour and texture analysis for image segmentation. Int’l Journal of Computer Vision, 43(1).
Manning, C.D. and Schiitze, H. 2003. Foundations of Statistical Natural Language Processing. MIT Press.
Marr, D. 1982. Vision. W.H. Freeman and Co. San Francisco.
Google Scholar
Martin, D., Fowlkes, C., Tal, D., and Malik, J. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. of 8th Int’l Conference on Computer Vision.
Metropolis. N., Rosenbluth M.N., Rosenbluth, A.W., Teller, A.H., and Teller, E. 1953. Equations of state calculations by fast computing machines, J. Chem. Phys., 21:1087–92.
Google Scholar
Moghaddam, B. and Pentland, A. 1997. Probabilistic visual learning for object representation. IEEE Trans. PAMI, 19(7).
Mumford, D.B. 1995. Neuronal Architectures for Pattern-theoretic Problems. In Large-Scale Neuronal Theories of the Brain. C. Koch and J. L. Davis (Eds). MIT Press: A Bradford Book.
Murphy, K., Torralba, A., and Freeman, W.T. 2003. Using the forest to see the tree: A graphical model relating features, objects and the scenes, NIPS.
Niblack, W. 1986. An Introduction to Digital Image Processing: Prentice Hall. pp. 115–116.
Phillips, P.J., Wechsler, H., Huang, J., and Rauss, P. 1998. The FERET database and evaluation procedure for face recognition algorithms. Image and Vision Computing Journal, 16(5).
Ponce, J., Lazebnik, S., Rothganger, F., and Schmid, C. 2004. Toward true 3D object recognition, Reconnaissance de Formes et Intelligence Artificielle, Toulous, FR.
Schapire, R.E. 2002. The boosting approach to machine learning: An overview, MSRI Workshop on Nonlinear Estimation and Classification.
Shi, J. and Malik, J. 2000. Normalized Cuts and Image Segmentation. IEEE Trans. PAMI, 22(8).
Thorpe, S., Fize, D., and Marlot, C. 1996. Speed of processing in the human visual system. Nature, 381(6582):520–522.
Article CAS PubMed Google Scholar
Treisman, A. 1986. Features and objects in visual processing. Scientific American.
Tu, Z. and Zhu, S.C. 2002. Image segmentation by Data-driven Markov chain Monte Carlo. IEEE Trans. PAMI, 24(5):657–673.
Google Scholar
Tu, Z.W. and Zhu, S.C. 2002. Parsing images into regions, curves and curve groups, Int’l Journal of Computer Vision, (Under review), A short version appeared in the Proc. of ECCV.
Tu, Z.W. and Yuille, A.L. 2004. Shape matching and recognition: Using generative models and informative features”. In Proceedings European Conference on Computer Vision. ECCV’04. Prague.
Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. J. of Cognitive Neurosciences, 3(1):71–86.
Google Scholar
Ullman, S. 1984. Visual routines. Cognition, 18:97–159.
Google Scholar
Ullman, S. 1995. Sequence seeking and counterstreams: A model for bidirectional information flow in the cortex. In Large-Scale Neuronal Theories of the Brain. C. Koch and J. L. Davis Eds. MIT Press. A Bradford Book.
Viola, P. and Jones, M. 2001. Fast and robust classification using asymmetric Adaboost and a detector cascade, In Proc. of NIPS01.
Weber, M., Welling, M., and Perona, P. 2000. Towards Automatic Discovery of Object Categories, Proc. of CVPR.
Wu, J., Regh, J.M., and Mullin, M.D. 2004. Learning a rare event detection cascade by direct feature selection, NIPS.
Yedidia, J.S., Freeman, W.T., and Weiss, Y. 2001. Generalized belief propagation. In Advances in Neural Information Processing Systems, 13:689–695.
Google Scholar
Yuille, A.L. 2004. Belief Propagation and Gibbs Sampling. Submitted to Neural Computation.
Zhu, S.C. and Yuille, A.L. 1996. Region competition: Unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Trans. PAMI, 18(9).
Zhu, S.C., Zhang, R., and Tu, Z.W. 2000. Integrating top-down/bottom-up for object recognition by data-driven Markov chain Monte Carlo. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition,Hilton Head, SC.

Download references

Author information

Authors and Affiliations

Departments of Statistics, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Zhuowen Tu & Xiangrong Chen
Departments of ‘Statistics’ and ‘Psychology’, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Alan L. Yuille
Departments of ‘Statistics’ and ‘Computer Science’, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Song-Chun Zhu

Authors

Zhuowen Tu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangrong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Alan L. Yuille
View author publications
You can also search for this author in PubMed Google Scholar
Song-Chun Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuowen Tu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tu, Z., Chen, X., Yuille, A.L. et al. Image Parsing: Unifying Segmentation, Detection, and Recognition. Int J Comput Vision 63, 113–140 (2005). https://doi.org/10.1007/s11263-005-6642-x

Download citation

Received: 07 March 2003
Accepted: 23 June 2003
Published: 01 February 2005
Issue Date: July 2005
DOI: https://doi.org/10.1007/s11263-005-6642-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Parsing: Unifying Segmentation, Detection, and Recognition

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

Guided Search 6.0: An updated model of visual search

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Image Parsing: Unifying Segmentation, Detection, and Recognition

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Microsoft COCO: Common Objects in Context

Guided Search 6.0: An updated model of visual search

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation