Learning to Parse Pictures of People

  • Remi Ronfard
  • Cordelia Schmid
  • Bill Triggs
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2353)


Detecting people in images is a key problem for video indexing, browsing and retrieval. The main difficulties are the large appearance variations caused by action, clothing, illumination, viewpoint and scale. Our goal is to find people in static video frames using learned models of both the appearance of body parts (head, limbs, hands), and of the geometry of their assemblies. We build on Forsyth & Fleck’s general ‘body plan’ methodology and Felzenszwalb & Huttenlocher’s dynamic programming approach for efficiently assembling candidate parts into ‘pictorial structures’. However we replace the rather simple part detectors used in these works with dedicated detectors learned for each body part using Support Vector Machines (SVMs) or Relevance Vector Machines (RVMs). We are not aware of any previous work using SVMs to learn articulated body plans, however they have been used to detect both whole pedestrians and combinations of rigidly positioned subimages (typically, upper body, arms, and legs) in street scenes, under a wide range of illumination, pose and clothing variations. RVMs are SVM-like classifiers that offer a well-founded probabilistic interpretation and improved sparsity for reduced computation. We demonstrate their benefits experimentally in a series of results showing great promise for learning detectors in more general situations.


Object recognition image and video indexing grouping and segmentation statistical pattern recognition kernel methods 


  1. 1.
    C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.Google Scholar
  2. 2.
    E. Candes and D. Donoho. Curvelets-a surprisingly effective nonadaptive representation for objects with edges. In L. L. Schumaker et al., editor, Curves and Surfaces. Vanderbilt University Press, 1999.Google Scholar
  3. 3.
    J. Coughlan, D. Snow, C. English, and A. Yuille. Efficient optimization of a deformable template using dynamic programming. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000.Google Scholar
  4. 4.
    N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and other Kernel Based Learning Methods. Cambridge University Press, 2000.Google Scholar
  5. 5.
    M. Do and M. Vetterli. Orthonormal finite ridgelet transform for image compression. In Int. Conference on Image Processing, volume 2, pages 367–370, 2000.Google Scholar
  6. 6.
    Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Efficient matching of pictorial structures. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2000.Google Scholar
  7. 7.
    M. Fischler and R. Elschlager. The representation and matching of pictorial structures. IEEE Trans. Computer, C-22:67–92, 1973.CrossRefGoogle Scholar
  8. 8.
    D. Forsyth and M. Fleck. Body plans. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1997.Google Scholar
  9. 9.
    B. Heisele, T. Poggio, and M. Pontil. Face detection in still gray images. Technical report, AI Memo 1687, Massachusetts Institute of Technology, 2000.Google Scholar
  10. 10.
    D. Hogg. Model-based vision: A program to see a walking person. Image and Vision Computing, 1(1):5–20, 1983.CrossRefGoogle Scholar
  11. 11.
    Sergey Ioffe and David Forsyth. Human tracking with mixtures of trees. In Proc. Int. Conf. Computer Vision, 2001.Google Scholar
  12. 12.
    Sergey Ioffe and David Forsyth. Mixtures of trees for object recognition. In Proc. IEEEConf. Computer Vision and Pattern Recognition, 2001.Google Scholar
  13. 13.
    Sergey Ioffe and David Forsyth. Probabilistic methods for finding people. Int. J. of Computer Vision, 43(1), 2001.Google Scholar
  14. 14.
    S. Ju, M. Black, and Y. Yacoob. Cardboard people: a parameterized model of articulated image motion. In Int. Conference on Automatic Face and Gesture Recognition, 1996.Google Scholar
  15. 15.
    D. Marr and H.K. Nishihara. Representation and recognition of the spatial organization of three dimensional structure. Proceedings of the Royal Society of London B, 200:269–294, 1978.CrossRefGoogle Scholar
  16. 16.
    Anuj Mohan, Constantine Papageorgiou, and Tomaso Poggio. Example-based object detection in images by components. IEEE Trans. Pattern Analysis and Machine Intelligence, 23(4), 2001.Google Scholar
  17. 17.
    D. Morris and J. Rehg. Singularity analysis for articulated object tracking. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1998.Google Scholar
  18. 18.
    C. Papageorgiou. Object and pattern detection in video sequences. Technical report, Master’s thesis, Massachusetts Institute of Technology, 1997.Google Scholar
  19. 19.
    K. Rohr. Incremental recognition of pedestrians from image sequences. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, pages 8–13, 1993.Google Scholar
  20. 20.
    H. Sidenbladh and M. Black. Learning the statistics of people in images and video. Int. Journal of Computer Vision, 2001.Google Scholar
  21. 21.
    H. Sidenbladh, F. Torre, and M. Black. A framework for modeling the appearance of 3d articulated figures. In Int. Conference on Automatic Face and Gesture Recognition, 2000.Google Scholar
  22. 22.
    M. Tipping. The relevance vector machine. In Advances in Neural Information Processing Systems. Morgan Kaufmann, 2000.Google Scholar
  23. 23.
    M. E. Tipping. Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211–244, 2001.MathSciNetzbMATHGoogle Scholar
  24. 24.
    V. Vapnik. Statistical Learning Theory. Wiley, 1998.Google Scholar
  25. 25.
    Liang Zhao and Chuck Thorpe. Recursive context reasoning for human detection and parts identification. In IEEE Workshop on Human Modeling, Analysis and Synthesis, 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Remi Ronfard
    • 1
  • Cordelia Schmid
    • 1
  • Bill Triggs
    • 1
  1. 1.INRIAMontbonnotFrance

Personalised recommendations