International Journal of Computer Vision

, Volume 54, Issue 1–3, pp 183–209 | Cite as

Learning the Statistics of People in Images and Video

  • Hedvig Sidenbladh
  • Michael J. Black

Abstract

This paper address the problems of modeling the appearance of humans and distinguishing human appearance from the appearance of general scenes. We seek a model of appearance and motion that is generic in that it accounts for the ways in which people's appearance varies and, at the same time, is specific enough to be useful for tracking people in natural scenes. Given a 3D model of the person projected into an image we model the likelihood of observing various image cues conditioned on the predicted locations and orientations of the limbs. These cues are taken to be steered filter responses corresponding to edges, ridges, and motion-compensated temporal differences. Motivated by work on the statistics of natural scenes, the statistics of these filter responses for human limbs are learned from training images containing hand-labeled limb regions. Similarly, the statistics of the filter responses in general scenes are learned to define a “background” distribution. The likelihood of observing a scene given a predicted pose of a person is computed, for each limb, using the likelihood ratio between the learned foreground (person) and background distributions. Adopting a Bayesian formulation allows cues to be combined in a principled way. Furthermore, the use of learned distributions obviates the need for hand-tuned image noise models and thresholds. The paper provides a detailed analysis of the statistics of how people appear in scenes and provides a connection between work on natural image statistics and the Bayesian tracking of people.

human tracking image statistics Bayesian inference articulated models multiple cues likelihood models 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Black, M.J. and Anandan, P. 1996. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer Vision and Image Understanding, 63(1):75-104.Google Scholar
  2. Black, M.J. and Jepson, A.D. 1998. Eigentracking: Robust matching and tracking of articulated objects using a view-based representation. International Journal of Computer Vision, 26(1):63-84.Google Scholar
  3. Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 8-15.Google Scholar
  4. Cham, T.-J. and Rehg, J.M. 1999. A multiple hypothesis approach to figure tracking. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, vol. 1, pp. 239-245.Google Scholar
  5. Comaniciu, D., Ramesh, V., and Meer, P. 2000. Real-time tracking of non-rigid objects using mean shift. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. 142- 149.Google Scholar
  6. Darrell, T., Gordon, G., Harville, M., and Woodfill, J. 2000. Integrated person tracking using stereo, color, and pattern detection. International Journal of Computer Vision, 37(2):175-185.Google Scholar
  7. DeCarlo, D. and Metaxas, D. 1996. The integration of optical flow and deformable models with applications to human face shape and motion estimation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 231-238.Google Scholar
  8. Deutscher, J., Blake, A., and Reid, I. 2000. Articulated motion capture by annealed particle filtering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, vol. 2, pp. 126-133.Google Scholar
  9. Fischler, M.A. and Bolles, R.C. 1981. RANSAC random sample consensus:A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 26:381-395.Google Scholar
  10. Freeman, W.T. and Adelson, E.H. 1991. The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9):891-906.Google Scholar
  11. Gavrila, D.M. 1996. Vision-based 3-D tracking of humans in action. Ph.D. thesis, University of Maryland, College Park, MD.Google Scholar
  12. Gavrila, D.M. 1999. The visual analysis of human movement: A survey. Computer Vision and Image Understanding, 73(1):82-98.Google Scholar
  13. Geman, D. and Jedynak, B. 1996. Anactive testing model for tracking roads in satellite images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(1):1-14.Google Scholar
  14. Gordon, N. 1993. A novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings on Radar, Sonar and Navigation, 140(2):107-113.Google Scholar
  15. Hogg, D.C. 1983. Model-based vision: A program to see a walking person. Image and Vision Computing, 1(1):5-20.Google Scholar
  16. Haritaoglu, I., Harwood, D., and Davis, L.S. 2000. W4: Real-time surveillance of people and their activities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):809-830.Google Scholar
  17. Isard, M. and Blake, A. 1998. Condensation-Conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1):5-28.Google Scholar
  18. Jepson, A.D., Fleet, D.J., and El-Maraghi, T.F. 2001. Robust on-line appearance models for visual tracking. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, vol. I, pp. 415-422.Google Scholar
  19. Ju, S.X., Black, M.J., and Yacoob, Y. 1996. Cardboard people: A parameterized model of articulated motion. In International Conference on Automatic Face and Gesture Recognition, pp. 38-44.Google Scholar
  20. Kaliath, T. 1951. The divergence and Bhattarcharyya distance measures in signal selection. IEEE Transactions on Communication Technology, COM-15(1):52-60.Google Scholar
  21. Konishi, S.M., Yuille, A.L., Coughlan, J.M., and Zhu, S.C. 1999. Fundamental bounds on edge detection: An information theoretic evaluation of different edge cues. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 573-579.Google Scholar
  22. Kullback, S. and Leibler, R.A. 1951. On information and sufficiency. Annals of Mathematical Statistics, 22:79-86.Google Scholar
  23. Lee, A.B., Mumford, D., and Huang, J. 2001. Occlusion models for natural images: A statistical study of a scale-invariant dead leaves model. International Journal of Computer Vision, 41(1/2):35-59.Google Scholar
  24. Lindeberg, T. 1998. Edge detection and ridge detection with automatic scale selection. International Journal of Computer Vision, 30(2):117-156.Google Scholar
  25. Moeslund, T.B. and Granum, E. 2001. A survey of computer vision-based human motion capture. Computer Vision and Image Understanding, 18:231-268.Google Scholar
  26. Nestares, O. and Fleet, D.J. 2001. Probabilistic tracking of motion boundaries with spatiotemporal predictions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, vol. II, pp. 358-365.Google Scholar
  27. Olshausen, B.A. and Field, D.J. 1996. Natural image statistics and efficient coding. Computation in Neural Systems, 7(2):333- 339.Google Scholar
  28. Ormoneit, D., Sidenbladh, H., Black, M.J., and Hastie, T. 2001. Learning and tracking cyclic human motion. In Advances in Neural Information Processing Systems 13, T.K. Leen, T.G. Dietterich, and V. Tresp (Eds.), pp. 894-900.Google Scholar
  29. Rasmussen, C. and Hager, G. 2001. Probabilistic data association methods for tracking complex visual objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):560-576.Google Scholar
  30. Rehg, J. and Kanade, T. 1995. Model-based tracking of selfoccluding articulated objects. In IEEE International Conference on Computer Vision, ICCV, pp. 612-617.Google Scholar
  31. Rittscher, J., Kato, J., Joga, S., and Blake, A. 2000. A probabilistic background model for tracking. In European Conference on Computer Vision, ECCV, D. Vernon (Ed.), pp. 336-350.Google Scholar
  32. Rohr, K. 1994. Towards model-based recognition of human movements in image sequences. CVGIP-Image Understanding, 59(1):94-115.Google Scholar
  33. Rohr, K. 1997. Human movement analysis based on explicit motion models. In Motion-Based Recognition, M. Shah and R. Jain (Eds.), pp. 171-198.Google Scholar
  34. Ruderman, D.L. 1994. The statistics of natural images. Network: Computation in Neural Systems, 5(4):517-548.Google Scholar
  35. Ruderman, D.L. 1997. Origins of scaling in natural images. Vision Research, 37(23):3385-3395.Google Scholar
  36. Sidenbladh, H. 2001. Probabilistic tracking and reconstruction of 3D human motion in monocular video sequences. Ph.D. Thesis, KTH, Sweden. TRITA-NA-0114.Google Scholar
  37. Sidenbladh, H. and Black, M.J. 2001. Learning image statistics for Bayesian tracking. In IEEE International Conference on Computer Vision, ICCV, vol. 2, pp. 709-716.Google Scholar
  38. Sidenbladh, H., Black, M.J., and Fleet, D.J. 2000a. Stochastic tracking of 3D human figures using 2D image motion. In European Conference on Computer Vision, ECCV, D. Vernon (Ed.), vol. 2, pp. 702-718.Google Scholar
  39. Sidenbladh, H., Black, M.J., and Sigal, L. 2002. Implicit probabilistic models of human motion for synthesis and tracking. In European Conference on Computer Vision, ECCV, Copenhagen.Google Scholar
  40. Sidenbladh, H., De la Torre, F., and Black, M.J. 2000b. A framework for modeling the appearance of 3D articulated figures. In International Conference on Automatic Face and Gesture Recognition, pp. 368-375.Google Scholar
  41. Simoncelli, E.P. 1997. Statistical models for images: Compression, restoration and optical flow. In Asilomar Conference on Signals, Systems and Computers.Google Scholar
  42. Simoncelli, E.P., Adelson, E.H., and Heeger, D.J. 1991. Probability distributions of optical flow. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 310-315.Google Scholar
  43. Sminchisescu, C. and Triggs, B. 2001. Covariance scaled sampling for monocular 3D body tracking. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 447-454.Google Scholar
  44. Sullivan, J., Blake, A., Isard, M., and MacCormick, J. 1999. Object localization by Bayesian correlation. In IEEE International Conference on Computer Vision, ICCV, vol. 2, pp. 1068-1075.Google Scholar
  45. Sullivan, J., Blake, A., and Rittscher, J. 2000. Statistical foreground modelling for object localisation. In European Conference on Computer Vision, ECCV, D. Vernon (Ed.), vol. 2, pp. 307-323.Google Scholar
  46. Wachter, S. and Nagel, H. 1999. Tracking of persons in monocular image sequences. Computer Vision and Image Understanding, 74(3):174-192.Google Scholar
  47. Wren, C., Azarbayejani, A., Darrel, T., and Pentland, A. 1997. Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):780-785.Google Scholar
  48. Yacoob, Y. and Black, M.J. 1999. Parameterized modeling and recognition of activities. Computer Vision and Image Understanding, 73(2):232-247.Google Scholar
  49. Zhu, S.C. and Mumford, D. 1997. Learning generic prior models for visual computation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(11):1236-1250.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Hedvig Sidenbladh
    • 1
  • Michael J. Black
    • 2
  1. 1.Computational Vision and Active Perception Laboratory, Department of Numerical Analysis and Computer ScienceKTHStockholmSweden
  2. 2.Department of Computer ScienceBrown UniversityProvidenceUSA

Personalised recommendations