Recursive Compositional Models for Vision: Description and Review of Recent Work

  • Long (Leo) Zhu
  • Yuanhao Chen
  • Alan YuilleEmail author


This paper describes and reviews a class of hierarchical probabilistic models of images and objects. Visual structures are represented in a hierarchical form where complex structures are composed of more elementary structures following a design principle of recursive composition. Probabilities are defined over these structures which exploit properties of the hierarchy—e.g. long range spatial relationships can be represented by local potentials at the upper levels of the hierarchy. The compositional nature of this representation enables efficient learning and inference algorithms. In particular, parts can be shared between different object models. Overall the architecture of Recursive Compositional Models (RCMs) provides a balance between statistical and computational complexity.

The goal of this paper is to describe the basic ideas and common themes of RCMs, to illustrate their success on a range of vision tasks, and to gives pointers to the literature. In particular, we show that RCMs generally give state of the art results when applied to a range of different vision tasks and evaluated on the leading benchmarked datasets.


Computer vision Image processing Machine learning Object detection Image parsing 


  1. 1.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan-Kaufmann, San Mateo (1988) Google Scholar
  2. 2.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999) zbMATHGoogle Scholar
  3. 3.
    Heckerman, D.: A tutorial on learning with Bayesian networks, pp. 301–354 (1999) Google Scholar
  4. 4.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd ed. Prentice Hall, New York (2003) Google Scholar
  5. 5.
    Zhu, S., Mumford, D.: A stochastic grammar of images 2(4), 259–362 (2006) zbMATHGoogle Scholar
  6. 6.
    Grenander, U.: Pattern Synthesis: Lectures in Pattern Theory 1. Springer, New York (1976) Google Scholar
  7. 7.
    Grenander, U.: Pattern Analysis: Lectures in Pattern Theory 2. Springer, New York (1978) Google Scholar
  8. 8.
    Tenenbaum, J., Yuille, A.: IPAM Summer School: The Mathematics of the Mind IPAM, UCLA (2007) Google Scholar
  9. 9.
    Jin, Y., Geman, S.: Context and hierarchy in a probabilistic image model. In: CVPR (2), pp. 2145–2152 (2006) Google Scholar
  10. 10.
    Zhu, L., Yuille, A.L.: A hierarchical compositional system for rapid object detection. In: NIPS (2005) Google Scholar
  11. 11.
    Zhu, L., Chen, Y., Lu, Y., Lin, C., Yuille, A.L.: Max margin and/or graph learning for parsing the human body. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  12. 12.
    Zhu, L., Chen, Y., Ye, X., Yuille, A.L.: Learning a hierarchical log-linear model for rapid deformable object parsing. In: CVPR (2008) Google Scholar
  13. 13.
    Zhu, L., Chen, Y., Lin, Y., Yuille, A.L.: A hierarchical image model for polynomial-time 2d parsing. In: Advances in Neural Information Processing System (2008) Google Scholar
  14. 14.
    Zhu, L., Lin, C., Huang, H., Chen, Y., Yuille, A.L.: Unsupervised structure learning: Hierarchical recursive composition, suspicious coincidence and competitive exclusion. In: Proceedings of the 10th European Conference on Computer Vision (2008) Google Scholar
  15. 15.
    Zhu, L., Chen, Y., Yuille, A.L.: Learning a hierarchical deformable template for rapid deformable object parsing. In: Transactions on Pattern Analysis and Machine Intelligence (2009) Google Scholar
  16. 16.
    Zhu, L., Chen, Y., Torrabla, A., Freeman, W., Yuille, A.L.: Recursive compositional models with re-usable parts for multi-view multi-object detection and parsing. In: CVPR (2010) Google Scholar
  17. 17.
    Zhu, L., Chen, Y., Yuille, A.L., Freeman, W.: Latent hierarchical structure learning for object detection. In: CVPR (2010) Google Scholar
  18. 18.
    Borenstein, E., Ullman, S.: Class-specific, top-down segmentation. In: ECCV (2), pp. 109–124 (2002) Google Scholar
  19. 19.
    Van Gool, E.M.L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes. Challenge, (2007). (VOC2007) Results.
  20. 20.
    Russell, B., Torralba, A., Murphy, K., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. In: IJCV (2008) Google Scholar
  21. 21.
    Mori, G.: Guiding model search using segmentation. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1417–1423 (2005) Google Scholar
  22. 22.
    Magee, D.R., Boyle, R.D.: Detecting lameness using ‘re-sampling condensation’ and ‘multi-stream cyclic hidden Markov models’. In: Image and Vision Computing, vol. 20, p. 2002 (2002) Google Scholar
  23. 23.
    Li, H., Yan, S.-C., Peng, L.-Z.: Robust non-frontal face alignment with edge based texture. J. Comput. Sci. Technol. 20(6), 849–854 (2005) CrossRefGoogle Scholar
  24. 24.
    Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: ECCV (1), pp. 1–15 (2006) Google Scholar
  25. 25.
    Tu, Z., Zhu, S.C.: Image segmentation by data-driven Markov chain Monte Carlo. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 657–673 (2002) CrossRefGoogle Scholar
  26. 26.
    Tu, Z., Chen, X., Yuille, A.L., Zhu, S.C.: Image parsing: unifying segmentation, detection, and recognition. In: ICCV, pp. 18–25 (2003) Google Scholar
  27. 27.
    Chen, H., Xu, Z., Liu, Z., Zhu, S.C.: Composite templates for cloth modeling and sketching. In: CVPR (1), pp. 943–950 (2006) Google Scholar
  28. 28.
    Wu, Y., Si, Z., Fleming, C., Zhu, S.: Deformable template as active basis. In: Proceedings of International Conference of Computer Vision (2007) Google Scholar
  29. 29.
    Amit, Y., Geman, D., Fan, X.: A coarse-to-fine strategy for multiclass shape detection. IEEE Trans. Pattern Anal. Mach. Intell. 26(12), 1606–1621 (2004) CrossRefGoogle Scholar
  30. 30.
    Willsky, A.S.: Multiresolution Markov models for signal and image processing. Proc. IEEE 90(8), 1396–1458 (2002) CrossRefGoogle Scholar
  31. 31.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) MathSciNetzbMATHCrossRefGoogle Scholar
  32. 32.
    Ranzato, M., Boureau, Y.-L., LeCun, Y.: Sparse feature learning for deep belief networks. In: NIPS (2007) Google Scholar
  33. 33.
    Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems (NIPS), vol. 19 (2007) Google Scholar
  34. 34.
    Lee, H., Grosse, R., Ranganath, R., Ng, A.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: International Conference on Machine Learning (ICML) (2009) Google Scholar
  35. 35.
    Riesenhuber, M., Poggio, T.: Cbf: A new framework for object categorization in cortex. In: Biologically Motivated Computer Vision, pp. 1–9 (2000) CrossRefGoogle Scholar
  36. 36.
    Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: CVPR (2), pp. 994–1000 (2005) Google Scholar
  37. 37.
    Thorpe, S., Fabre-Thorpe, M.: Seeking categories in the brain. Science 291(5502), 260–263 (2001) CrossRefGoogle Scholar
  38. 38.
    Fukushima, K.: Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Netw. 1(2), 119–130 (1988) CrossRefGoogle Scholar
  39. 39.
    Jensen, F.V., Lauritzen, S.L., Olesen, K.G.: Bayesian updating in causal probabilistic networks by local computations. Comput. Stat.Q. 4, 269–282 (1990) MathSciNetGoogle Scholar
  40. 40.
    Tenenbaum, J., Griffiths, T., Kemp, C.: Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn. Sci. 10(7), 309–318 (2006) CrossRefGoogle Scholar
  41. 41.
    Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: EMNLP, pp. 1–8 (2002) Google Scholar
  42. 42.
    Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: NIPS (2003) Google Scholar
  43. 43.
    Taskar, B., Klein, D., Collins, M., Koller, D., Manning, C.: Max-margin parsing. In: EMNLP (2004) Google Scholar
  44. 44.
    Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: ICML, pp. 3–10 (2003) Google Scholar
  45. 45.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004) Google Scholar
  46. 46.
    Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution and Bayesian restoration of images (1984) Google Scholar
  47. 47.
    Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987) Google Scholar
  48. 48.
    Geiger, D., Ladendorf, B., Yuille, A.: Occlusions and binocular stereo (1995) Google Scholar
  49. 49.
    Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26, 359–374 (2001) Google Scholar
  50. 50.
    Geiger, D., Yuille, A.: A common framework for image segmentation (1991) Google Scholar
  51. 51.
    Felzenswalb, P., Huttenlocher, D.: Efficient belief propagation for early vision (2004) Google Scholar
  52. 52.
    Viola, P., Jones, M.: Robust real-time object detection. International Journal of Computer Vision (2001) Google Scholar
  53. 53.
    Konishi, S., Yuille, A.L., Coughlan, J.M., Zhu, S.C.: Statistical edge detection: Learning and evaluating edge cues. IEEE Trans. Pattern Anal. Mach. Intell. 25, 57–74 (2003) CrossRefGoogle Scholar
  54. 54.
    Zhu, S.C., Wu, Y.N., Mumford, D.: Minimax entropy principle and its application to texture modeling. Neural Comput. 9(8), 1627–1660 (1997) CrossRefGoogle Scholar
  55. 55.
    Kumar, S., Hebert, M.: Discriminative random fields: A discriminative framework for contextual interaction in classification. In: ICCV, pp. 1150–1157 (2003) Google Scholar
  56. 56.
    Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22(1), 67–92 (1973) CrossRefGoogle Scholar
  57. 57.
    Yuille, A.L., Hallinan, P.W., Cohen, D.S.: Feature extraction from faces using deformable templates. Int. J. Comput. Vis. 8(2), 99–111 (1992) CrossRefGoogle Scholar
  58. 58.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. Lecture Notes in Computer Science 1407, 484 (1998). [Online]. Available: CrossRefGoogle Scholar
  59. 59.
    Coughlan, J.M., Yuille, A.L., English, C., Snow, D.: Efficient deformable template detection and localization without user initialization. Comput. Vis. Image Underst. 78(3), 303–319 (2000) CrossRefGoogle Scholar
  60. 60.
    Chui, H., Rangarajan, A.: A new algorithm for non-rigid point matching. In: CVPR, pp. 2044–2051 (2000) Google Scholar
  61. 61.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002) CrossRefGoogle Scholar
  62. 62.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005) CrossRefGoogle Scholar
  63. 63.
    Fergus, R., Perona, P., Zisserman, A.: A sparse object category model for efficient learning and exhaustive recognition. In: CVPR (1), pp. 380–387 (2005) Google Scholar
  64. 64.
    Tu, Z., Yuille, A.L.: Shape matching and recognition—using generative models and informative features. In: ECCV (3), pp. 195–209 (2004) Google Scholar
  65. 65.
    He, X., Zemel, R.S., Carreira-Perpiñán, M.Á.: Multiscale conditional random fields for image labeling. In: CVPR (2), pp. 695–702 (2004) Google Scholar
  66. 66.
    Winn, J.M., Jojic, N.: Locus: Learning object classes with unsupervised segmentation. In: ICCV, pp. 756–763 (2005) Google Scholar
  67. 67.
    Kumar, M.P., Torr, P.H.S., Zisserman, A.: Obj cut. In: CVPR (1), pp. 18–25 (2005) Google Scholar
  68. 68.
    Felzenszwalb, P.F., Grishick, R.B., McAllister, D., Ramanan, D.: Object detection with discriminatively trained part based models. In: PAMI (2009) Google Scholar
  69. 69.
    Ahuja, N., Todorovic, S.: Learning the taxonomy and models of categories present in arbitrary image. In: ICCV (2007) Google Scholar
  70. 70.
    Sharon, E., Brandt, A., Basri, R.: Fast multiscale image segmentation. In: CVPR, pp. 1070–1077 (2000) Google Scholar
  71. 71.
    Kokkinos, I., Yuille, A.L.: Hop: Hierarchical object parsing. In: CVPR (2009) Google Scholar
  72. 72.
    Zhu, L., Yuille, A.L.: A hierarchical compositional system for rapid object detection. In: NIPS (2005) Google Scholar
  73. 73.
    Chen, Y., Zhu, L., Lin, C., Yuille, A.L., Zhang, H.: Rapid inference on a novel and/or graph for object detection, segmentation and parsing. In: NIPS (2007) Google Scholar
  74. 74.
    He, C.: Empirical studies of structured learning for deformable object parsing. Master’s Thesis, Department of Statistics, UCLA (2008) Google Scholar
  75. 75.
    Wu, S., He, X., Lu, H., Yuille, A.: A unified model of short-rang and long-range motion perception. In: NIPS (2010) Google Scholar
  76. 76.
    Yu, C.-N.J., Joachims, T.: Learning structural SVMs with latent variables. In: International Conference on Machine Learning (ICML) (2009) Google Scholar
  77. 77.
    Yuille, A.L., Rangarajan, A.: The concave-convex procedure (CCCP). In: NIPS, pp. 1033–1040 (2001) Google Scholar
  78. 78.
    Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV’04 Workshop on Statistical Learning in Computer Vision, Prague, Czech Republic, 2004, pp. 17–32 (2004) Google Scholar
  79. 79.
    Coughlan, J., Yuille, A.L.: Bayesian a* tree search with expected o(n) node expansions for road tracking. Neural Comput. 14(8), 1929–1958 (2002) zbMATHCrossRefGoogle Scholar
  80. 80.
    Ren, X., Fowlkes, C., Malik, J.: Cue integration for figure/ground labeling. In: NIPS (2005) Google Scholar
  81. 81.
    Borenstein, E., Malik, J.: Shape guided object segmentation. In: CVPR (1), pp. 969–976 (2006) Google Scholar
  82. 82.
    Cour, T., Shi, J.: Recognizing objects by piecing together the segmentation puzzle. In: CVPR (2007) Google Scholar
  83. 83.
    Levin, A., Weiss, Y.: Learning to combine bottom-up and top-down segmentation. In: ECCV (4), pp. 581–594 (2006) Google Scholar
  84. 84.
    Srinivasan, P., Shi, J.: Bottom-up recognition and parsing of the human body. In: CVPR (2007) Google Scholar
  85. 85.
    Verbeek, J., Triggs, B.: Region classification with Markov field aspect models. In: CVPR (2007) Google Scholar
  86. 86.
    Tu, Z.: Auto-context and its application to high-level vision tasks. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  87. 87.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: Proceedings of the International Conference on Computer Vision (2009) Google Scholar
  88. 88.
    Chen, Y., Zhu, L., Yuille, A.L.: Active mask hierarchies for object detection. In: Proceedings of the 12th European Conference on Computer Vision (2010) Google Scholar
  89. 89.
    Allwein, E.L., Schapire, R.E., Singer, Y.: Reducing multiclass to binary: a unifying approach for margin classifiers. J. Mach. Learn. Res. 1, 113–141 (2000) MathSciNetCrossRefGoogle Scholar
  90. 90.
    Fidler, S., Leonardis, A.: Towards scalable representations of object categories: learning a hierarchy of parts. In: CVPR (2007) Google Scholar
  91. 91.
    Rother, C., Kolmogorov, V., Blake, A.: “grabcut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceNew York UniversityNew YorkUSA
  2. 2.Department of StatisticsUniversity of California at Los AngelesLos AngelesUSA
  3. 3.Department of Brain and Cognitive EngineeringKorea UniversitySeoulKorea

Personalised recommendations