Skip to main content

Advertisement

Log in

Image Parsing: Unifying Segmentation, Detection, and Recognition

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a “parsing graph”, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches—generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns—generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002. IEEE Trans. PAMI, 24(5):657–673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Barbu, A. and Zhu, S.C. 2003. Graph partition by Swendsen-Wang cut. In Proc. of Int’l Conf. on Computer Vision, Nice, France.

    Google Scholar 

  • Barbu, A. and Zhu, S.C. 2004. Multi-grid and multi-level Swendsen-Wang cuts for hierarchic graph partition. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, Washington DC.

  • Barnard, K. and Forsyth, D.A. 2001. Learning the semantics of words and pictures, ICCV.

  • Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts, IEEE Trans, on Pattern Analysis and Machine Intelligence, 24:509–522.

    Google Scholar 

  • Bienenstock, E., Geman, S., and Potter, D. 1997. Compositionality, MDL Priors, and Object Recognition, NIPS.

  • Blanchard, G. and Geman, D. 2003. Hierarchical testing designs for pattern recognition. Technical report, Math. Science, Johns Hopkins University.

  • Bremaud, P. 1999. Markov Chains: Gibbs Fields, Monte Carlo Simulation and Queues. Springer. (Chapter 6).

  • Bowyer, K.W. Kranenburg, C., and Dougherty, S. 2001. Edge detector evaluation using empirical ROC curves, Computer Vision and Image Understanding, 84(1):77–103.

    Google Scholar 

  • Canny, J. 1986. A computational approach to edge detection. IEEE Trans, on PAMI, 8(6).

  • Chen, X. and Yuille, A.L. 2004. AdaBoost learning for detecting and reading text in city scenes. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition. Washington DC.

  • Cootes, T.F., Edwards, G.J., and Taylor, C.J. 2001. Active appearance models. IEEE Trans, on PAMI, 6(23).

  • Comaniciu, D. and Meer, P. 1999. Mean shift analysis and applications. Proc. of ICCV.

  • Cover, T.M. and Thomas, J.A. 1991Elements of Information Theory. John Wiley and Sons, Inc: NY, pp. 33–36.

    Google Scholar 

  • Dayan, P., Hinton, G., Neal, R., and Zemel, R. 1995. The Helmholtz Machine. Neural Computation, 7:889–904.

    Google Scholar 

  • Diaconis, P. and Hanlon, P. 1992. Eigenanalysis for some examples of the Metropolis algorithms. Contemporary Mathematics, 138:99–117.

    Google Scholar 

  • Drucker, H., Schapire, R., and Simard, P. 1993. Boosting performance in neural networks. Intl J. Pattern Rec. and Artificial Intelligence, 7(4).

  • Fowlkes, C. and Malik, J. 2004. How Much Does Globalization Help Segmentation? CVPR.

  • Freund, Y. and Schapire, R. 1996. Experiments with a new boosting algorithm. In Proc. of 13th Int’l Conference on Machine Learning.

  • Friedman, J., Hastie, T. and Tibshirani, R. 1998. Additive logistic regression: A statistical view of boosting, Dept. of Statistics, Stanford Univ. Technical Report.

  • Geman, S. and Huang, C.R. 1986. Diffusion for global optimization, SIAM J. on Control and Optimization, 24(5).

  • Green, P.J. 1995. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika, 82(4):711–732.

    MATH  Google Scholar 

  • Hallinan, P., Gordon, G., Yuille, A., Giblin, P., and Mumford, D. 1999. Two and Three Dimensional Patterns of the Face, A.K. Peters.

  • Han, F. and Zhu, S.C. 2003. Bayesian reconstruction of 3D shapes and scenes from a single image. In Proc. Int’l Workshop on High Level Knowledge in 3D Modeling and Motion, Nice France.

  • Hastings, W.K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57:97–109.

    Google Scholar 

  • Klein, D. and Manning, C.D. 2002. A generative constituent-context model for improved grammar induction. Proc of 40th Annual Meeting of the Assoc. for Computational Linguistics.

  • Konishi, S., Coughlan, J.M., Yuille, A.L., and Zhu, S.C. 2003. Statistical edge detection: learning and evaluating edge cues. IEEE Trans, on Pattern Analysis and Machine Intelligence, 25(1):57–74.

    Google Scholar 

  • Kumar, S. and Hebert, M. 2003. Discriminative random fields. In Proc. of Int’l Conf. on Computer Vision, Nice, France.

    Google Scholar 

  • Li, F.F., VanRullen, R., Koch, C., and Perona, P. 2003. Rapid natural scene categorization in the near absence of attention. In Proc. of National Academy of Sciences, 99(14).

  • Liu, J.S. 2001. Monte Carlo Strategies in Scientific Computing. Springer.

  • Lowe, L.D. 2003. Distinctive image features from scale-invariant keypoints, IJCV.

  • Maciuca, R. and Zhu, S.C. 2003. How do heuristics expedite markov chain search. In Proc. of 3rd Workshop on Statistical and Computational Theory for Vision.

  • Malik, J., Belongie, S., Leung, T., and Shi, J. 2001. Contour and texture analysis for image segmentation. Int’l Journal of Computer Vision, 43(1).

  • Manning, C.D. and Schiitze, H. 2003. Foundations of Statistical Natural Language Processing. MIT Press.

  • Marr, D. 1982. Vision. W.H. Freeman and Co. San Francisco.

    Google Scholar 

  • Martin, D., Fowlkes, C., Tal, D., and Malik, J. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. of 8th Int’l Conference on Computer Vision.

  • Metropolis. N., Rosenbluth M.N., Rosenbluth, A.W., Teller, A.H., and Teller, E. 1953. Equations of state calculations by fast computing machines, J. Chem. Phys., 21:1087–92.

    Google Scholar 

  • Moghaddam, B. and Pentland, A. 1997. Probabilistic visual learning for object representation. IEEE Trans. PAMI, 19(7).

  • Mumford, D.B. 1995. Neuronal Architectures for Pattern-theoretic Problems. In Large-Scale Neuronal Theories of the Brain. C. Koch and J. L. Davis (Eds). MIT Press: A Bradford Book.

  • Murphy, K., Torralba, A., and Freeman, W.T. 2003. Using the forest to see the tree: A graphical model relating features, objects and the scenes, NIPS.

  • Niblack, W. 1986. An Introduction to Digital Image Processing: Prentice Hall. pp. 115–116.

  • Phillips, P.J., Wechsler, H., Huang, J., and Rauss, P. 1998. The FERET database and evaluation procedure for face recognition algorithms. Image and Vision Computing Journal, 16(5).

  • Ponce, J., Lazebnik, S., Rothganger, F., and Schmid, C. 2004. Toward true 3D object recognition, Reconnaissance de Formes et Intelligence Artificielle, Toulous, FR.

  • Schapire, R.E. 2002. The boosting approach to machine learning: An overview, MSRI Workshop on Nonlinear Estimation and Classification.

  • Shi, J. and Malik, J. 2000. Normalized Cuts and Image Segmentation. IEEE Trans. PAMI, 22(8).

  • Thorpe, S., Fize, D., and Marlot, C. 1996. Speed of processing in the human visual system. Nature, 381(6582):520–522.

    Article  CAS  PubMed  Google Scholar 

  • Treisman, A. 1986. Features and objects in visual processing. Scientific American.

  • Tu, Z. and Zhu, S.C. 2002. Image segmentation by Data-driven Markov chain Monte Carlo. IEEE Trans. PAMI, 24(5):657–673.

    Google Scholar 

  • Tu, Z.W. and Zhu, S.C. 2002. Parsing images into regions, curves and curve groups, Int’l Journal of Computer Vision, (Under review), A short version appeared in the Proc. of ECCV.

  • Tu, Z.W. and Yuille, A.L. 2004. Shape matching and recognition: Using generative models and informative features”. In Proceedings European Conference on Computer Vision. ECCV’04. Prague.

  • Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. J. of Cognitive Neurosciences, 3(1):71–86.

    Google Scholar 

  • Ullman, S. 1984. Visual routines. Cognition, 18:97–159.

    Google Scholar 

  • Ullman, S. 1995. Sequence seeking and counterstreams: A model for bidirectional information flow in the cortex. In Large-Scale Neuronal Theories of the Brain. C. Koch and J. L. Davis Eds. MIT Press. A Bradford Book.

  • Viola, P. and Jones, M. 2001. Fast and robust classification using asymmetric Adaboost and a detector cascade, In Proc. of NIPS01.

  • Weber, M., Welling, M., and Perona, P. 2000. Towards Automatic Discovery of Object Categories, Proc. of CVPR.

  • Wu, J., Regh, J.M., and Mullin, M.D. 2004. Learning a rare event detection cascade by direct feature selection, NIPS.

  • Yedidia, J.S., Freeman, W.T., and Weiss, Y. 2001. Generalized belief propagation. In Advances in Neural Information Processing Systems, 13:689–695.

    Google Scholar 

  • Yuille, A.L. 2004. Belief Propagation and Gibbs Sampling. Submitted to Neural Computation.

  • Zhu, S.C. and Yuille, A.L. 1996. Region competition: Unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Trans. PAMI, 18(9).

  • Zhu, S.C., Zhang, R., and Tu, Z.W. 2000. Integrating top-down/bottom-up for object recognition by data-driven Markov chain Monte Carlo. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition,Hilton Head, SC.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhuowen Tu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tu, Z., Chen, X., Yuille, A.L. et al. Image Parsing: Unifying Segmentation, Detection, and Recognition. Int J Comput Vision 63, 113–140 (2005). https://doi.org/10.1007/s11263-005-6642-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-005-6642-x

Keywords

Navigation