Abstract
We show quantitative evidence that a full relational model of the body performs better at upper body parsing than the standard tree model, despite the need to adopt approximate inference and learning procedures. Our method uses an approximate search for inference, and an approximate structure learning method to learn. We compare our method to state of the art methods on our dataset (which depicts a wide range of poses), on the standard Buffy dataset, and on the reduced PASCAL dataset published recently. Our results suggest that the Buffy dataset over emphasizes poses where the arms hang down, and that leads to generalization problems.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: CVPR (2009)
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for static human-object interactions (2010)
Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: British Machine Vision Conference (2009)
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Learning to describe objects. In: CVPR (2009)
Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. IJCV 61(1), 55–79 (2005)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient matching of pictorial structures. In: CVPR (2000)
Felzenszwalb, P.F., McAllester, D.A., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)
Fergus, R., Perona, P., Zisserman, A.: Object Class Recognition by Unsupervised Scale-Invariant Learning. In: CVPR (2003)
Ferrari, V., Marin, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR (2008)
Ioffe, S., Forsyth, D.: Finding people by sampling. In: ICCV, pp. 1092–1097 (1999)
Ioffe, S., Forsyth, D.: Human tracking with mixtures of trees. In: ICCV, pp. 690–695 (2001)
Jiang, H.: Human pose estimation using consistent max-covering. In: ICCV (2009)
Jiang, H., Martin, R.: Global pose estimation using non-tree models. In: CVPR (2008)
Johnson, S., Everingham, M.: Combining discriminative appearance and segmentation cues for articulated human pose estimation. In: MLVMA 2009 (2009)
Mori, G., Malik, J.: Estimating human body configurations using shape context matching. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 666–680. Springer, Heidelberg (2002)
Mori, G., Ren, X., Efros, A.A., Malik, J.: Recovering human body configurations: Combining segmentation and recognition. In: CVPR (2004)
Platt, J.: Probabilities for sv machines. In: Advances in Neural Information Processing (1999)
Ramanan, D.: Learning to parse images of articulated bodies. In: Advances in Neural Information Processing (2006)
Ramanan, D., Forsyth, D., Barnard, K.: Building models of animals from video. PAMI 28(8), 1319–1334 (2006)
Ratliff, N., Bagnell, J.A., Zinkevich, M.: Subgradient methods for maximum margin structured learning. In: ICML 2006 Workshop on Learning in Structured Output Spaces (2006)
Ronfard, R., Schmid, C., Triggs, B.: Learning to parse pictures of people. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, p. 700. Springer, Heidelberg (2002)
Sapp, B., Jordan, C., Taskar, B.: Adaptive pose prior for pictorial structure. In: CVPR (2010)
Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR (2007)
Sigal, L., Black, M.J.: Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In: CVPR (2006)
Song, Y., Feng, X., Perona, P.: Towards detection of human motion. In: CVPR, pp. 810–817 (2000)
Taskar, B.: Learning Structured Prediction Models: A Large Margin Approach. PhD thesis, Stanford University (2004)
Taskar, B., Lacoste-Julien, S., Jordan, M.: Structured prediction via the extragradient method. In: Neural Information Processing Systems Conference (2005)
Tian, T.-P., Sclaroff, S.: Fast globally optimal 2d human detection with loopy graph models. In: CVPR (2010)
Tran, D., Forsyth, D.: Configuration estimates improve pedestrian finding. In: Advances in Neural Information Processing (2007)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR) 6, 1453–1484 (2005)
Varma, M., Zisserman, A.: A statistical approach to texture classification from single images. Int. J. Computer Vision 62(1-2), 61–81 (2005)
Yao, B., Fei-Fei, L.: Model mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tran, D., Forsyth, D. (2010). Improved Human Parsing with a Full Relational Model. In: Daniilidis, K., Maragos, P., Paragios, N. (eds) Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol 6314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15561-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-15561-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15560-4
Online ISBN: 978-3-642-15561-1
eBook Packages: Computer ScienceComputer Science (R0)