Image Parsing: Unifying Segmentation, Detection, and Recognition

  • Zhuowen Tu
  • Xiangrong Chen
  • Alan Yuille
  • Song Chun Zhu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4170)


In this chapter we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a “parsing graph”, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches – generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this chapter, we focus on two types of visual patterns – generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation [48].). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.


Markov Chain Image Segmentation Visual Pattern Shape Descriptor Discriminative Model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barbu, A., Zhu, S.C.: Graph partition by Swendsen-Wang cut. In: Proc. of Int’l Conf. on Computer Vision, Nice, France (October 2003)Google Scholar
  2. 2.
    Barbu, A., Zhu, S.C.: Multi-grid and multi-level Swendsen-Wang cuts for hierarchic graph partition. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, Washington DC (June 2004)Google Scholar
  3. 3.
    Barnard, K., Forsyth, D.A.: Learning the semantics of words and pictures. In: ICCV (2001)Google Scholar
  4. 4.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. on Pattern Analysis and Machine Intelligence 24, 509–522 (2002)CrossRefGoogle Scholar
  5. 5.
    Bienenstock, E., Geman, S., Potter, D.: Compositionality, MDL Priors, and Object Recognition. In: NIPS (1997)Google Scholar
  6. 6.
    Blanchard, G., Geman, D.: Hierarchical testing designs for pattern recognition. Technical report, Math. Science, Johns Hopkins University (2003)Google Scholar
  7. 7.
    Bremaud, P.: Markov Chains: Gibbs Fields, Monte Carlo Simulation and Queues, ch.6. Springer, Heidelberg (1999)Google Scholar
  8. 8.
    Bowyer, K.W., Kranenburg, C., Dougherty, S.: Edge detector evaluation using empirical ROC curves. Computer Vision and Image Understanding 84(1), 77–103 (2001)MATHCrossRefGoogle Scholar
  9. 9.
    Canny, J.: A computational approach to edge detection. IEEE Trans. on PAMI 8(6) (November 1986)Google Scholar
  10. 10.
    Chen, X., Yuille, A.L.: AdaBoost Learning for Detecting and Reading Text in City Scenes. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, Washington DC (June 2004)Google Scholar
  11. 11.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. on PAMI 23(6) (2001)Google Scholar
  12. 12.
    Comaniciu, D., Meer, P.: Mean Shift Analysis and Applications. In: Proc. of ICCV (1999)Google Scholar
  13. 13.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory, pp. 33–36. John Wiley and Sons, Inc., NY (1991)MATHCrossRefGoogle Scholar
  14. 14.
    Dayan, P., Hinton, G., Neal, R., Zemel, R.: The Helmholtz Machine. In: Neural Computation, vol. 7, pp. 889–904 (1995)Google Scholar
  15. 15.
    Diaconis, P., Hanlon, P.: Eigenanalysis for some examples of the Metropolis algorithms. Contemporary Mathematics 138, 99–117 (1992)MathSciNetGoogle Scholar
  16. 16.
    Drucker, H., Schapire, R., Simard, P.: Boosting performance in neural networks. Intl. J. Pattern Rec. and Artificial Intelligence 7(4) (1993)Google Scholar
  17. 17.
    Fowlkes, C., Malik, J.: How Much Does Globalization Help Segmentation? In: CVPR 2004 (2004)Google Scholar
  18. 18.
    Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proc. of 13th Int’l. Conference on Machine Learning (1996)Google Scholar
  19. 19.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting., Dept. of Statistics, Stanford Univ. Technical Report (1998)Google Scholar
  20. 20.
    Geman, S., Huang, C.R.: Diffusion for global optimization. SIAM J. on Control and Optimization 24(5) (1986)Google Scholar
  21. 21.
    Green, P.J.: Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination. Biometrika 82(4), 711–732 (1995)MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Hallinan, P., Gordon, G., Yuille, A., Giblin, P., Mumford, D.: Two and Three Dimensional Patterns of the Face. A.K. Peters (1999)Google Scholar
  23. 23.
    Han, F., Zhu, S.C.: Bayesian reconstruction of 3D shapes and scenes from a single image. In: Proc. Int’l Workshop on High Level Knowledge in 3D Modeling and Motion, Nice, France (October 2003)Google Scholar
  24. 24.
    Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)MATHCrossRefGoogle Scholar
  25. 25.
    Klein, D., Manning, C.D.: A generative constituent-context model for improved grammar induction. In: Proc. of 40th Annual Meeting of the Assoc. for Computational Linguistics (July 2002)Google Scholar
  26. 26.
    Konishi, S., Coughlan, J.M., Yuille, A.L., Zhu, S.C.: Statistical edge detection: learning and evaluating edge cues. IEEE Trans. on Pattern Analysis and Machine Intelligence 25(1), 57–74 (2003)CrossRefGoogle Scholar
  27. 27.
    Kumar, S., Hebert, M.: Discriminative Random Fields. In: Proc. of Int’l Conf. on Computer Vision, Nice, France (October 2003)Google Scholar
  28. 28.
    Li, F.F., VanRullen, R., Koch, C., Perona, P.: Rapid natural scene categorization in the near absence of attention. Proc. of National Academy of Sciences 99(14) (2003)Google Scholar
  29. 29.
    Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, Heidelberg (2001)MATHGoogle Scholar
  30. 30.
    Lowe, L.D.: Distinctive image features from scale-invariant keypoints. In: IJCV (2003)Google Scholar
  31. 31.
    Maciuca, R., Zhu, S.C.: How Do Heuristics Expedite Markov Chain Search. In: Proc. of 3rd Workshop on Statistical and Computational Theory for Vision (2003)Google Scholar
  32. 32.
    Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. Int’l Journal of Computer Vision 43(1) (2001)Google Scholar
  33. 33.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2003)Google Scholar
  34. 34.
    Marr, D.: Vision. W.H. Freeman and Co, San Francisco (1982)Google Scholar
  35. 35.
    Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proc. of 8th Int’l Conference on Computer Vision (2001)Google Scholar
  36. 36.
    Metropolis, N., Rosenbluth, M.N., Rosenbluth, A.W., Teller, A.H., Teller, E.: Equations of State Calculations by Fast Computing Machines. J. Chem. Phys. 21, 1087–1092 (1953)CrossRefGoogle Scholar
  37. 37.
    Moghaddam, B., Pentland, A.: Probabilistic visual learning for object representation. IEEE Trans. PAMI 19(7) (1997)Google Scholar
  38. 38.
    Mumford, D.B.: Neuronal Architectures for Pattern-theoretic Problems. In: Koch, C., Davis, J.L. (eds.) Large-Scale Neuronal Theories of the Brain. A Bradford Book, MIT Press (1995)Google Scholar
  39. 39.
    Murphy, K., Torralba, A., Freeman, W.T.: Using the forest to see the tree: a graphical model relating features, objects and the scenes. In: NIPS (2003)Google Scholar
  40. 40.
    Niblack, W.: An Introduction to Digital Image Processing, pp. 115–116. Prentice-Hall, Englewood Cliffs (1986)Google Scholar
  41. 41.
    Phillips, P.J., Wechsler, H., Huang, J., Rauss, P.: The FERET database and evaluation procedure for face recognition algorithms. Image and Vision Computing Journal 16(5) (1998)Google Scholar
  42. 42.
    Ponce, J., Lazebnik, S., Rothganger, F., Schmid, C.: Toward true 3D object recognition. In: Reconnaissance de Formes et Intelligence Artificielle, Toulous, FR (2004)Google Scholar
  43. 43.
    Rosen-Zvi, M., Jordan, M., Yuille, A.L.: The DLR Hierarchy of Approximate Inference. In: Proceedings Uncertainty in Artificial Intelligence, pp. 493–500 (2005)Google Scholar
  44. 44.
    Schapire, R.E.: The boosting approach to machine learning: an overview. In: MSRI Workshop on Nonlinear Estimation and Classification (2002)Google Scholar
  45. 45.
    Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Trans. PAMI 22(8) (August 2000)Google Scholar
  46. 46.
    Thorpe, S., Fize, D., Marlot, C.: Speed of processing in the human visual system. Nature 381(6582), 520–522 (1996)CrossRefGoogle Scholar
  47. 47.
    Treisman, A.: Features and objects in visual processing. Scientific American (November 1986)Google Scholar
  48. 48.
    Tu, Z., Zhu, S.C.: Image segmentation by Data-driven Markov chain Monte Carlo. IEEE Trans. PAMI 24(5), 657–673 (2002)Google Scholar
  49. 49.
    Tu, Z., Zhu, S.-C.: Parsing images into regions, curves and curve groups. A short version appeared in: Int’l Journal of Computer Vision (under review); Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 393–407. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  50. 50.
    Tu, Z., Yuille, A.L.: Shape matching and recognition – using generative models and informative features. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 195–209. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  51. 51.
    Tu, Z.W., Chen, Z., Yuille, A.L., Zhu, S.C.: Image Parsing: Unifying Segmentation, Detection, and Recognition. Int. Journal of Computer Vision 2(63), 113–140 (2005)CrossRefGoogle Scholar
  52. 52.
    Turk, M., Pentland, A.: Eigenfaces for recognition. J. of Cognitive Neurosciences 3(1), 71–86 (1991)CrossRefGoogle Scholar
  53. 53.
    Ullman, S.: Visual routines. Cognition 18, 97–159 (1984)CrossRefGoogle Scholar
  54. 54.
    Ullman, S.: Sequence Seeking and Counterstreams: A Model for Bidirectional Information Flow in the Cortex. In: Koch, C., Davis, J.L. (eds.) Large-Scale Neuronal Theories of the Brain. A Bradford Book, MIT Press, Cambridge (1995)Google Scholar
  55. 55.
    Viola, P., Jones, M.: Fast and robust classification using asymmetric Adaboost and a detector cascade. In: Proc. of NIPS 2001 (2001)Google Scholar
  56. 56.
    Weber, M., Welling, M., Perona, P.: Towards Automatic Discovery of Object Categories. In: Proc. of CVPR (2000)Google Scholar
  57. 57.
    Wu, J., Regh, J.M., Mullin, M.D.: Learning a rare event detection cascade by direct feature selection. In: NIPS (2004)Google Scholar
  58. 58.
    Yedidia, J.S., Freeman, W.T., Weiss, Y.: Generalized belief propagation. In: Advances in Neural Information Processing Systems 13, pp. 689–695 (2001)Google Scholar
  59. 59.
    Yuille, A.L.: Belief Propagation and Gibbs Sampling. Neural Computation (submitted, 2004) Google Scholar
  60. 60.
    Zhu, S.C., Yuille, A.L.: Region competition: unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Trans. PAMI 18(9) (1996)Google Scholar
  61. 61.
    Zhu, S.C., Zhang, R., Tu, Z.W.: Integrating top-down/bottom-up for object recognition by data-driven Markov chain Monte Carlo. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, Hilton Head, SC (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Zhuowen Tu
    • 1
  • Xiangrong Chen
    • 1
  • Alan Yuille
    • 1
  • Song Chun Zhu
    • 1
  1. 1.Department of StatisticsUCLA.Los AngelesUSA

Personalised recommendations