Abstract
Detecting people in images is a key problem for video indexing, browsing and retrieval. The main difficulties are the large appearance variations caused by action, clothing, illumination, viewpoint and scale. Our goal is to find people in static video frames using learned models of both the appearance of body parts (head, limbs, hands), and of the geometry of their assemblies. We build on Forsyth & Fleck’s general ‘body plan’ methodology and Felzenszwalb & Huttenlocher’s dynamic programming approach for efficiently assembling candidate parts into ‘pictorial structures’. However we replace the rather simple part detectors used in these works with dedicated detectors learned for each body part using Support Vector Machines (SVMs) or Relevance Vector Machines (RVMs). We are not aware of any previous work using SVMs to learn articulated body plans, however they have been used to detect both whole pedestrians and combinations of rigidly positioned subimages (typically, upper body, arms, and legs) in street scenes, under a wide range of illumination, pose and clothing variations. RVMs are SVM-like classifiers that offer a well-founded probabilistic interpretation and improved sparsity for reduced computation. We demonstrate their benefits experimentally in a series of results showing great promise for learning detectors in more general situations.
This work was supported by the European Union FET-Open research project VIBES
Chapter PDF
Similar content being viewed by others
Keywords
References
C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.
E. Candes and D. Donoho. Curvelets-a surprisingly effective nonadaptive representation for objects with edges. In L. L. Schumaker et al., editor, Curves and Surfaces. Vanderbilt University Press, 1999.
J. Coughlan, D. Snow, C. English, and A. Yuille. Efficient optimization of a deformable template using dynamic programming. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000.
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and other Kernel Based Learning Methods. Cambridge University Press, 2000.
M. Do and M. Vetterli. Orthonormal finite ridgelet transform for image compression. In Int. Conference on Image Processing, volume 2, pages 367–370, 2000.
Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Efficient matching of pictorial structures. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000.
M. Fischler and R. Elschlager. The representation and matching of pictorial structures. IEEE Trans. Computer, C-22:67–92, 1973.
D. Forsyth and M. Fleck. Body plans. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1997.
B. Heisele, T. Poggio, and M. Pontil. Face detection in still gray images. Technical report, AI Memo 1687, Massachusetts Institute of Technology, 2000.
D. Hogg. Model-based vision: A program to see a walking person. Image and Vision Computing, 1(1):5–20, 1983.
Sergey Ioffe and David Forsyth. Human tracking with mixtures of trees. In Proc. Int. Conf. Computer Vision, 2001.
Sergey Ioffe and David Forsyth. Mixtures of trees for object recognition. In Proc. IEEEConf. Computer Vision and Pattern Recognition, 2001.
Sergey Ioffe and David Forsyth. Probabilistic methods for finding people. Int. J. of Computer Vision, 43(1), 2001.
S. Ju, M. Black, and Y. Yacoob. Cardboard people: a parameterized model of articulated image motion. In Int. Conference on Automatic Face and Gesture Recognition, 1996.
D. Marr and H.K. Nishihara. Representation and recognition of the spatial organization of three dimensional structure. Proceedings of the Royal Society of London B, 200:269–294, 1978.
Anuj Mohan, Constantine Papageorgiou, and Tomaso Poggio. Example-based object detection in images by components. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(4), 2001.
D. Morris and J. Rehg. Singularity analysis for articulated object tracking. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1998.
C. Papageorgiou. Object and pattern detection in video sequences. Technical report, Master’s thesis, Massachusetts Institute of Technology, 1997.
K. Rohr. Incremental recognition of pedestrians from image sequences. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, pages 8–13, 1993.
H. Sidenbladh and M. Black. Learning the statistics of people in images and video. Int. Journal of Computer Vision, 2001.
H. Sidenbladh, F. Torre, and M. Black. A framework for modeling the appearance of 3d articulated figures. In Int. Conference on Automatic Face and Gesture Recognition, 2000.
M. Tipping. The relevance vector machine. In Advances in Neural Information Processing Systems. Morgan Kaufmann, 2000.
M. E. Tipping. Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211–244, 2001.
V. Vapnik. Statistical Learning Theory. Wiley, 1998.
Liang Zhao and Chuck Thorpe. Recursive context reasoning for human detection and parts identification. In IEEE Workshop on Human Modeling, Analysis and Synthesis, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ronfard, R., Schmid, C., Triggs, B. (2002). Learning to Parse Pictures of People. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds) Computer Vision — ECCV 2002. ECCV 2002. Lecture Notes in Computer Science, vol 2353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47979-1_47
Download citation
DOI: https://doi.org/10.1007/3-540-47979-1_47
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43748-2
Online ISBN: 978-3-540-47979-6
eBook Packages: Springer Book Archive