International Journal of Computer Vision

, Volume 93, Issue 2, pp 226–252 | Cite as

A Numerical Study of the Bottom-Up and Top-Down Inference Processes in And-Or Graphs

  • Tianfu WuEmail author
  • Song-Chun Zhu


This paper presents a numerical study of the bottom-up and top-down inference processes in hierarchical models using the And-Or graph as an example. Three inference processes are identified for each node A in a recursively defined And-Or graph in which stochastic context sensitive image grammar is embedded: the α(A) process detects node A directly based on image features, the β(A) process computes node A by binding its child node(s) bottom-up and the γ(A) process predicts node A top-down from its parent node(s). All the three processes contribute to computing node A from images in complementary ways. The objective of our numerical study is to explore how much information each process contributes and how these processes should be integrated to improve performance. We study them in the task of object parsing using And-Or graph formulated under the Bayesian framework. Firstly, we isolate and train the α(A), β(A) and γ(A) processes separately by blocking the other two processes. Then, information contributions of each process are evaluated individually based on their discriminative power, compared with their respective human performance. Secondly, we integrate the three processes explicitly for robust inference to improve performance and propose a greedy pursuit algorithm for object parsing. In experiments, we choose two hierarchical case studies: one is junctions and rectangles in low-to-middle-level vision and the other is human faces in high-level vision. We observe that (i) the effectiveness of the α(A), β(A) and γ(A) processes depends on the scale and occlusion conditions, (ii) the α(face) process is stronger than the α processes of facial components, while β(junctions) and β(rectangle) work much better than their α processes, and (iii) the integration of the three processes improves performance in ROC comparisons.


Bottom-up/Top-down inference αβγ process Information contribution Hierarchical model And-Or graph Object parsing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Amit, Y., & Trouvé, A. (2007). Pop: Patchwork of parts models for object recognition. International Journal of Computer Vision, 75(2), 267–282. CrossRefGoogle Scholar
  2. Avidan, S. (2006). Spatialboost: Adding spatial reasoning to adaboost. In ECCV, pp. 386–396. Google Scholar
  3. Aycinena, M., Kaelbling, L. P., & Lozano-Perez, T. (2008). Learning grammatical models for object recognition. Tech. rep., MIT CSAIL. Google Scholar
  4. Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115–147. CrossRefGoogle Scholar
  5. Blanchard, G., & Geman, D. (2005). Hierarchical testing designs for pattern recognition. Statist Annals of Statistics, 33(3), 1155–1202. CrossRefzbMATHMathSciNetGoogle Scholar
  6. Borenstein, E., & Ullman, S. (2008). Combined top-down/bottom-up segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(12), 2109–2125. CrossRefGoogle Scholar
  7. Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. CrossRefGoogle Scholar
  8. Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. (1984). Classification and Regression Trees. Wadsworth and Brooks. Google Scholar
  9. Dechter, R., & Pearl, J. (1985). Generalized best-first search strategies and the optimality of a*. Journal of the ACM, 32(3), 505–536. CrossRefzbMATHMathSciNetGoogle Scholar
  10. Demirci, M. F., Shokoufandeh, A., Keselman, Y., Bretzner, L., & Dickinson, S. (2006). Object recognition as many-to-many feature matching. International Journal of Computer Vision, 69(2), 203–222. CrossRefGoogle Scholar
  11. Demirci, M. F., Platel, B., Shokoufandeh, A., Florack, L. L., & Dickinson, S. J. (2009). The representation and matching of images using top points. Journal of Mathematical Imaging and Vision, 35(2), 103–116. CrossRefMathSciNetGoogle Scholar
  12. Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A. A., & Hebert, M. (2009). An empirical study of context in object detection. In CVPR. Google Scholar
  13. Epshtein, B., Lifshitz, I., & Ullman, S. (2008). Image interpretation by a single bottom-up top-down cycle. Proceedings of the National Academy of Sciences, 105(38), 14298–14303. CrossRefGoogle Scholar
  14. Fei-Fei, L., Rob, F., & Pietro, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611. CrossRefGoogle Scholar
  15. Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79. CrossRefGoogle Scholar
  16. Felzenszwalb, P., & McAllester, D. (2007). The generalized a* architecture. Journal of Artificial Intelligence Research, 29, 153–190. zbMATHMathSciNetGoogle Scholar
  17. Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence. Google Scholar
  18. Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. International Journal of Computer Vision, 71(3), 273–303. CrossRefGoogle Scholar
  19. Fidler, S., Boben, M., & Leonardis, A. (2008). Similarity-based cross-layered hierarchical representation for object categorization. In CVPR. Google Scholar
  20. Fink, M., & Perona, P. (2003). Mutual boosting for contextual inference. In NIPS. Google Scholar
  21. Fleuret, F., & Geman, D. (2008). Stationary features and cat detection. Journal of Machine Learning Research, 9, 2549–2578. MathSciNetGoogle Scholar
  22. Geman, S., Potter, D., & Chi, Z. Y. (2002). Composition systems. Quarterly of Applied Mathematics, 60(4), 707–736. zbMATHMathSciNetGoogle Scholar
  23. Guo, C. E., Zhu, S. C., & Wu, Y. N. (2007). Primal sketch: Integrating structure and texture. Computer Vision and Image Understanding, 106(1), 5–19. CrossRefGoogle Scholar
  24. Han, F., & Zhu, S. C. (2009). Bottom-up/top-down image parsing with attribute grammar. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 59–73. CrossRefMathSciNetGoogle Scholar
  25. Heisele, B., Serre, T., & Poggio, T. (2007). A component-based framework for face detection and identification. International Journal of Computer Vision, 74(2), 167–181. CrossRefGoogle Scholar
  26. Hoiem, D., Efros, A., & Hebert, M. (2008). Putting objects in perspective. International Journal of Computer Vision, 80(1), 3–15. CrossRefGoogle Scholar
  27. Jin, Y., & Geman, S. (2006) Context and hierarchy in a probabilistic image model. In CVPR, pp. 2145–2152. Google Scholar
  28. Kokkinos, I., & Yuille, A. (2009). Hop: Hierarchical object parsing. In CVPR. Google Scholar
  29. Lampert, C. H., Blaschko, M., & Hofmann, T. (2009). Efficient subwindow search: A branch and bound framework for object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, Epub ahead: 1–13. Google Scholar
  30. Lee, T. S., & Mumford, D. (2003). Hierarchical Bayesian inference in the visual cortex. Journal of the Optical Society of America A: Optics Image Science, And Vision, 20(7), 1434–1448. CrossRefGoogle Scholar
  31. Levin, A., & Weiss, Y. (2009). Learning to combine bottom-up and top-down segmentation. International Journal of Computer Vision, 81(1), 105–118. CrossRefGoogle Scholar
  32. Meinshausen, N., Bickel, P., & Rice, J. (2009). Efficient blind search: Optimal power of detection under computational cost constraints. Annals of Applied Statistics, 3(1), 38–60. CrossRefzbMATHMathSciNetGoogle Scholar
  33. Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025. CrossRefGoogle Scholar
  34. Schneiderman, H., & Kanade, T. (2002). Object detection using the statistics of parts. International Journal of Computer Vision. Google Scholar
  35. Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 411–426. CrossRefGoogle Scholar
  36. Si, Z., Gong, H., Wu, Y. N., & Zhu, S. C. (2009). Learning mixed templates for object recognition. In CVPR, pp. 272–279. Google Scholar
  37. Sudderth, E. B., Torralba, A., Freeman, W., & Willsky, A. (2008). Describing visual scenes using transformed objects and parts. International Journal of Computer Vision, 77(1–3), 291–330. CrossRefGoogle Scholar
  38. Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, 520–522. CrossRefGoogle Scholar
  39. Todorovic, S., & Ahuja, N. (2008a). Region-based hierarchical image matching. International Journal of Computer Vision, 78(1), 47–66. CrossRefGoogle Scholar
  40. Todorovic, S., & Ahuja, N. (2008b). Unsupervised category modeling, recognition, and segmentation in images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(12), 2158–2174. CrossRefGoogle Scholar
  41. Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2), 169–191. CrossRefGoogle Scholar
  42. Torralba, A., & Murphy, K. (2007). Sharing visual features for multiclass and multiview object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5), 854–869. Senior Member-Freeman, William T. CrossRefGoogle Scholar
  43. Tu, Z. W., & Zhu, S. C. (2002). Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 657–673. CrossRefGoogle Scholar
  44. Tu, Z. W., Chen, X. R., Yuille, A., & Zhu, S. C. (2005). Image parsing: Unifying segmentation, detection, and recognition. International Journal of Computer Vision, 63(2), 113–140. CrossRefGoogle Scholar
  45. Ullman, S. (1984). Visual routines. Cognition, 18, 97–159. CrossRefGoogle Scholar
  46. Ullman, S., Naquet, M. V., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5(7), 682–687. Google Scholar
  47. Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154. CrossRefGoogle Scholar
  48. Wu, T. F., Xia, G. S., & Zhu, S. C. (2007). Compositional boosting for computing hierarchical image structures. In CVPR. Google Scholar
  49. Wu, Y. N., Si, Z. Z., Gong, H. F., & Zhu, S. C. (2009). Learning active basis model for object detection and recognition. International Journal of Computer Vision. DOI: 10.1007/s11263-009-0287-0. Google Scholar
  50. Yao, B., Yang, X., & Zhu, S. C. (2007). Introduction to a large scale general purpose ground truth dataset: methodology, annotation tool, and benchmarks. In EMMCVPR. Google Scholar
  51. Zhu, S. C., & Mumford, D. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362. CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of CaliforniaLos AngelesUSA
  2. 2.Department of Computer ScienceUniversity of CaliforniaLos AngelesUSA
  3. 3.Lotus Hill Research Institute (LHI)EzhouChina

Personalised recommendations