Abstract
In this paper we present a computationally efficient framework for part-based modeling and recognition of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to represent an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We address the problem of using pictorial structure models to find instances of an object in an image as well as the problem of learning an object model from training examples, presenting efficient algorithms in both cases. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.
Article PDF
Similar content being viewed by others
References
Amini, A.A., Weymouth, T.E., and Jain, R.C. 1990. Using dynamic programming for solving variational problems in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence,12(9):855-867.
Amit, Y. and Geman, D. 1999. A computational model for visual selection. Neural Computation,11(7):1691-1715.
Ayache, N.J. and Faugeras, O.D. 1986. Hyper: A new approach for the recognition and positioning of two-dimensional objects. IEEE Transactions on Pattern Analysis and Machine Intelligence,8(1):44-54.
Berger, J.O. 1985. Statistical Decision Theory and Bayesian Analysis. Springer-Verlag.
Borgefors, G. 1986. Distance transformations in digital images. Computer Vision, Graphics, and Image Processing,34(3):344-371.
Borgefors, G. 1988. Hierarchical chamfer matching: A parametric edge matching algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence,10(6):849-865.
Boykov, Y., Veksler, O., and Zabih, R. 2001. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence,23(11):1222-1239.
Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 8-15.
Burl, M.C. and Perona, P. 1996. Recognition of planar object classes. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 223-230.
Burl, M.C., Weber, M., and Perona, P. 1998. Aprobabilistic approach to object recognition using local photometry and global geometry. In European Conference on Computer Vision, pp. II:628-641.
Chow, C.K. and Liu, C.N. 1968. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory,14(3):462-467.
Cormen, T.H., Leiserson, C.E., and Rivest, R.L. 1996. Introduction to Algorithms. MIT Press and McGraw-Hill.
Dickinson, S.J., Biederman, I., Pentland, A.P., Eklundh, J.O., Bergevin, R., and Munck-Fairwood, R.C. 1993. The use of geons for generic 3-d object recognition. In International Joint Conference on Artificial Intelligence, pp. 1693-1699.
Felzenszwalb, P.F. and Huttenlocher, D.P. 2000. Efficient matching of pictorial structures. In IEEE Conference on Computer Vision and Pattern Recognition, pp. II:66-73.
Fischler, M.A. and Bolles, R.C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM,24(6):381-395.
Fischler, M.A. and Elschlager, R.A. 1973. The representation and matching of pictorial structures. IEEE Transactions on Computer,22(1):67-92.
Freeman, W.T. and Adelson, E.H. 1991. The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence,13(9):891-906.
Gdalyahu, Y. and Weinshall, D. 1999. Flexible syntactic matching of curves and its application to automatic hierarchical classification of silhouettes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(12):1312-1328.
Geman, S. and Geman, D. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence,6(6):721-741.
Grimson, W.E.L. and Lozano-Perez, T. 1987. Localizing overlapping parts by searching the interpretation tree. IEEE Transactions on Pattern Analysis and Machine Intelligence,9(4):469-482.
Gumbel, E.J., Greenwood, J.A., and Durand, D. 1953. The circular normal distribution: Theory and tables. Journal of the American Statistical Association,48:131-152.
Huttenlocher, D.P., Klanderman, G.A., and Rucklidge, W.J. 1993. Comparing images using the hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence,15(9):850-863.
Huttenlocher, D.P. and Ullman, S. 1990. Recognizing solid objects by alignment with an image. International Journal of Computer Vision,5(2):195-212.
Ioffe, S. and Forsyth, D.A. 2001. Probabilistic methods for finding people. International Journal of Computer Vision, 43(1):45-68.
Ishikawa, H. and Geiger, D. 1998. Segmentation by grouping junctions. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 125-131.
Ju, S.X., Black, M.J., and Yacoob, Y. 1996. Cardboard people: A parameterized model of articulated motion. In International Conference on Automatic Face and Gesture Recognition, pp. 38-44.
Karzanov, A.V. 1992. Quick algorithm for determining the distances from the points of the given subset of an integer lattice to the points of its complement. Cybernetics and System Analysis, pp. 177-181. Translation from the Russian by Julia Komissarchik.
Lamdan, Y., Schwartz, J.T., and Wolfson, H.J. 1990. Affine invariant model-based object recognition. IEEE Transactions on Robotics and Automation,6(5):578-589.
Moghaddam, B. and Pentland, A.P. 1997. Probabilistic visual learning for object representation. IEEE Transactions on Pattern Analysis and Machine Intelligence,19(7):696-710.
Murase, H. and Nayar, S.K. 1995. Visual learning and recognition of 3-d objects from appearance. International Journal of Computer Vision,14(1):5-24.
Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.
Pentland, A.P. 1987. Recognition by parts. In IEEE International Conference on Computer Vision, pp. 612-620.
Rabiner, L. and Juang, B. 1993. Fundamentals of Speech Recognition. Prentice Hall.
Ramanan, D. and Forsyth, D.A. 2003. Finding and tracking people from the bottom up. In IEEE Conference on Computer Vision and Pattern Recognition, pp. II:467-474.
Rao, R.P.N. and Ballard, D.H. 1995. An active vision architecture based on iconic representations. Artificial Intelligence,78(1/2):461-505.
Rivlin, E., Dickinson, S.J., and Rosenfeld, A. Recognition by functional parts. Computer Vision and Image Understanding,62(2):164-176, September 1995.
Roberts, L.G. 1965. Machine perception of 3-d solids. In Optical and Electro-optical Information Processing, pp. 159-197.
Rucklidge, W. 1996. Efficient Visual Recognition Using the Hausdorff Distance. Springer-Verlag, LNCS 1173.
Sebastian, T.B., Klein, P.N., and Kimia, B.B. 2001. Recognition of shapes by editing shock graphs. In IEEE International Conference on Computer Vision, pp. I:755-762.
Turk, M. and Pentland, A.P. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience,3(1):71-96.
Wells, W.M. III 1986. Efficient synthesis of Gaussian filters by cascaded uniform filters. IEEE Transactions on Pattern Analysis and Machine Intelligence,8(2):234-239.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Felzenszwalb, P.F., Huttenlocher, D.P. Pictorial Structures for Object Recognition. International Journal of Computer Vision 61, 55–79 (2005). https://doi.org/10.1023/B:VISI.0000042934.15159.49
Issue Date:
DOI: https://doi.org/10.1023/B:VISI.0000042934.15159.49