International Journal of Computer Vision

, Volume 61, Issue 1, pp 55–79 | Cite as

Pictorial Structures for Object Recognition

  • Pedro F. Felzenszwalb
  • Daniel P. Huttenlocher


In this paper we present a computationally efficient framework for part-based modeling and recognition of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to represent an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We address the problem of using pictorial structure models to find instances of an object in an image as well as the problem of learning an object model from training examples, presenting efficient algorithms in both cases. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.

part-based object recognition statistical models energy minimization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Amini, A.A., Weymouth, T.E., and Jain, R.C. 1990. Using dynamic programming for solving variational problems in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence,12(9):855-867.Google Scholar
  2. Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation,11(7):1691-1715.Google Scholar
  3. Ayache, N.J. and Faugeras, O.D. 1986. Hyper: A new approach for the recognition and positioning of two-dimensional objects. IEEE Transactions on Pattern Analysis and Machine Intelligence,8(1):44-54.Google Scholar
  4. Berger, J.O. 1985. Statistical Decision Theory and Bayesian Analysis. Springer-Verlag.Google Scholar
  5. Borgefors, G. 1986. Distance transformations in digital images. Computer Vision, Graphics, and Image Processing,34(3):344-371.Google Scholar
  6. Borgefors, G. 1988. Hierarchical chamfer matching: A parametric edge matching algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence,10(6):849-865.Google Scholar
  7. Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence,23(11):1222-1239.Google Scholar
  8. Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 8-15.Google Scholar
  9. Burl, M.C. and Perona, P. 1996. Recognition of planar object classes. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 223-230.Google Scholar
  10. Burl, M.C., Weber, M., and Perona, P. 1998. Aprobabilistic approach to object recognition using local photometry and global geometry. In European Conference on Computer Vision, pp. II:628-641.Google Scholar
  11. Chow, C.K. and Liu, C.N. 1968. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory,14(3):462-467.Google Scholar
  12. Cormen, T.H., Leiserson, C.E., and Rivest, R.L. 1996. Introduction to Algorithms. MIT Press and McGraw-Hill.Google Scholar
  13. Dickinson, S.J., Biederman, I., Pentland, A.P., Eklundh, J.O., Bergevin, R., and Munck-Fairwood, R.C. 1993. The use of geons for generic 3-d object recognition. In International Joint Conference on Artificial Intelligence, pp. 1693-1699.Google Scholar
  14. Felzenszwalb, P.F. and Huttenlocher, D.P. 2000. Efficient matching of pictorial structures. In IEEE Conference on Computer Vision and Pattern Recognition, pp. II:66-73.Google Scholar
  15. Fischler, M.A. and Bolles, R.C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM,24(6):381-395.Google Scholar
  16. Fischler, M.A. and Elschlager, R.A. 1973. The representation and matching of pictorial structures. IEEE Transactions on Computer,22(1):67-92.Google Scholar
  17. Freeman, W.T. and Adelson, E.H. 1991. The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence,13(9):891-906.Google Scholar
  18. Gdalyahu, Y. and Weinshall, D. 1999. Flexible syntactic matching of curves and its application to automatic hierarchical classification of silhouettes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(12):1312-1328.Google Scholar
  19. Geman, S. and Geman, D. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence,6(6):721-741.Google Scholar
  20. Grimson, W.E.L. and Lozano-Perez, T. 1987. Localizing overlapping parts by searching the interpretation tree. IEEE Transactions on Pattern Analysis and Machine Intelligence,9(4):469-482.Google Scholar
  21. Gumbel, E.J., Greenwood, J.A., and Durand, D. 1953. The circular normal distribution: Theory and tables. Journal of the American Statistical Association,48:131-152.Google Scholar
  22. Huttenlocher, D.P., Klanderman, G.A., and Rucklidge, W.J. 1993. Comparing images using the hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence,15(9):850-863.Google Scholar
  23. Huttenlocher, D.P. and Ullman, S. 1990. Recognizing solid objects by alignment with an image. International Journal of Computer Vision,5(2):195-212.Google Scholar
  24. Ioffe, S. and Forsyth, D.A. 2001. Probabilistic methods for finding people. International Journal of Computer Vision, 43(1):45-68.Google Scholar
  25. Ishikawa, H. and Geiger, D. 1998. Segmentation by grouping junctions. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 125-131.Google Scholar
  26. Ju, S.X., Black, M.J., and Yacoob, Y. 1996. Cardboard people: A parameterized model of articulated motion. In International Conference on Automatic Face and Gesture Recognition, pp. 38-44.Google Scholar
  27. Karzanov, A.V. 1992. Quick algorithm for determining the distances from the points of the given subset of an integer lattice to the points of its complement. Cybernetics and System Analysis, pp. 177-181. Translation from the Russian by Julia Komissarchik.Google Scholar
  28. Lamdan, Y., Schwartz, J.T., and Wolfson, H.J. 1990. Affine invariant model-based object recognition. IEEE Transactions on Robotics and Automation,6(5):578-589.Google Scholar
  29. Moghaddam, B. and Pentland, A.P. 1997. Probabilistic visual learning for object representation. IEEE Transactions on Pattern Analysis and Machine Intelligence,19(7):696-710.Google Scholar
  30. Murase, H. and Nayar, S.K. 1995. Visual learning and recognition of 3-d objects from appearance. International Journal of Computer Vision,14(1):5-24.Google Scholar
  31. Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.Google Scholar
  32. Pentland, A.P. 1987. Recognition by parts. In IEEE International Conference on Computer Vision, pp. 612-620.Google Scholar
  33. Rabiner, L. and Juang, B. 1993. Fundamentals of Speech Recognition. Prentice Hall.Google Scholar
  34. Ramanan, D. and Forsyth, D.A. 2003. Finding and tracking people from the bottom up. In IEEE Conference on Computer Vision and Pattern Recognition, pp. II:467-474.Google Scholar
  35. Rao, R.P.N. and Ballard, D.H. 1995. An active vision architecture based on iconic representations. Artificial Intelligence,78(1/2):461-505.Google Scholar
  36. Rivlin, E., Dickinson, S.J., and Rosenfeld, A. Recognition by functional parts. Computer Vision and Image Understanding,62(2):164-176, September 1995.Google Scholar
  37. Roberts, L.G. 1965. Machine perception of 3-d solids. In Optical and Electro-optical Information Processing, pp. 159-197.Google Scholar
  38. Rucklidge, W. 1996. Efficient Visual Recognition Using the Hausdorff Distance. Springer-Verlag, LNCS 1173.Google Scholar
  39. Sebastian, T.B., Klein, P.N., and Kimia, B.B. 2001. Recognition of shapes by editing shock graphs. In IEEE International Conference on Computer Vision, pp. I:755-762.Google Scholar
  40. Turk, M. and Pentland, A.P. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience,3(1):71-96.Google Scholar
  41. Wells, W.M. III 1986. Efficient synthesis of Gaussian filters by cascaded uniform filters. IEEE Transactions on Pattern Analysis and Machine Intelligence,8(2):234-239.Google Scholar

Copyright information

© Kluwer Academic Publishers 2005

Authors and Affiliations

  • Pedro F. Felzenszwalb
    • 1
  • Daniel P. Huttenlocher
    • 2
  1. 1.Artificial Intelligence LabMassachusetts Institute of TechnologyUSA
  2. 2.Computer Science DepartmentCornell UniversityUSA

Personalised recommendations