A Local Basis Representation for Estimating Human Pose from Cluttered Images

  • Ankur Agarwal
  • Bill Triggs
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3851)


Recovering the pose of a person from single images is a challenging problem. This paper discusses a bottom-up approach that uses local image features to estimate human upper body pose from single images in cluttered backgrounds. The method takes the image window with a dense grid of local gradient orientation histograms, followed by non negative matrix factorization to learn a set of bases that correspond to local features on the human body, enabling selective encoding of human-like features in the presence of background clutter. Pose is then recovered by direct regression. This approach allows us to key on gradient patterns such as shoulder contours and bent elbows that are characteristic of humans and carry important pose information, unlike current regressive methods that either use weak limb detectors or require prior segmentation to work. The system is trained on a database of images with labelled poses. We show that it estimates pose with similar performance levels to current example-based methods, but unlike them it works in the presence of natural backgrounds, without any prior segmentation.


Background Clutter Sift Descriptor Motion Capture Data Negative Matrix Factorization Cluttered Image 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, A., Triggs, B.: 3D Human Pose from Silhouettes by Relevance Vector Regression. In: Int. Conf. Computer Vision & Pattern Recognition (2004)Google Scholar
  2. 2.
    Agarwal, A., Triggs, B.: Monocular Human Motion Capture with a Mixture of Regressors. In: IEEE Workshop on Vision for Human-Computer Interaction (2005)Google Scholar
  3. 3.
    Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(11), 1475–1490 (2004)CrossRefGoogle Scholar
  4. 4.
    Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: Int. Conf. Computer Vision (2005)Google Scholar
  5. 5.
    Lowe, D.: Distinctive Image Features from Scale-invariant Keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  6. 6.
    Felzenszwalb, P., Huttenlocher, D.: Pictorial Structures for Object Recognition. International Journal of Computer Vision 61(1) (2005)Google Scholar
  7. 7.
    Fergus, R., Perona, P., Zisserman, A.: Object Class Recognition by Unsupervised Scale-Invariant Learning. In: Int. Conf. Computer Vision & Pattern Recognition (2003)Google Scholar
  8. 8.
    Hoyer, P.: Non-negative Matrix Factorization with Sparseness Constraints. J. Machine Learning Research 5, 1457–1469 (2004)MathSciNetGoogle Scholar
  9. 9.
    Mikolajczyk, K., Schmid, C., Zisserman, A.: Human Detection based on a Probabilistic Assembly of Robust Part Detectors. In: European Conference on Computer Vision, vol. I, pp. 69–81 (2004)Google Scholar
  10. 10.
    Lee, D.D., Seung, H.S.: Learning the Parts of Objects by Non–negative Matrix Factorization. Nature 401, 788–791 (1999)CrossRefGoogle Scholar
  11. 11.
    Lee, M., Cohen, I.: Human Upper Body Pose Estimation in Static Images. In: European Conference on Computer Vision (2004)Google Scholar
  12. 12.
    Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. International Journal of Computer Vision 43(1), 7–27 (2001)zbMATHCrossRefGoogle Scholar
  13. 13.
    Mori, G., Ren, X., Efros, A., Malik, J.: Recovering Human Body Configurations: Combining Segmentation and Recognition. In: Int. Conf. Computer Vision & Pattern Recognition (2004)Google Scholar
  14. 14.
    Olshausen, B., Field, D.: Natural image statistics and efficient coding. Network: Computation in Neural Systems 7(2), 333–339 (1996)CrossRefGoogle Scholar
  15. 15.
    Ramanan, D., Forsyth, D.: Finding and Tracking People from the Bottom Up. In: Int. Conf. Computer Vision & Pattern Recognition (2003)Google Scholar
  16. 16.
    Ronfard, R., Schmid, C., Triggs, B.: Learning to Parse Pictures of People. In: European Conference on Computer Vision, Copenhagen, pp. IV 700–714 (2002)Google Scholar
  17. 17.
    Kumar, S., Hebert, M.: Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification. In: Int. Conf. Computer Vision (2003)Google Scholar
  18. 18.
    Sali, E., Ullman, S.: Combining Class-specific Fragments for Object Classification. In: British Machine Vision Conference (1999)Google Scholar
  19. 19.
    Shakhnarovich, G., Viola, P., Darrell, T.: Fast Pose Estimation with Parameter Sensitive Hashing. In: Int. Conf. Computer Vision (2003)Google Scholar
  20. 20.
    Sigal, L., Isard, M., Sigelman, B., Black, M.: Assembling Loose-limbed Models using Non-parametric Belief Propagation. In: NIPS (2003)Google Scholar
  21. 21.
    Sminchisescu, C., Triggs, B.: Estimating articulated human motion with covariance scaled sampling. International Journal of Robotics Research (Special issue on Visual Analysis of Human Movement) 22(6), 371–391 (2003)Google Scholar
  22. 22.
    Sullivan, J., Blake, A., Isaard, M., MacCormick, J.: Object Localization by Bayesian Correlation. In: Int. Conf. Computer Vision (1999)Google Scholar
  23. 23.
    van Haateran, J., vander Schaaf, A.: Independent component filters of natural images compared with simlpe cells in preimary visual cortex. Proc. R. Soc. Lond., B 265, 359–366 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ankur Agarwal
    • 1
  • Bill Triggs
    • 1
  1. 1.GRAVIR-INRIA-CNRSMontbonnotFrance

Personalised recommendations