Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation

Tian, Yuandong; Zitnick, C. Lawrence; Narasimhan, Srinivasa G.

doi:10.1007/978-3-642-33715-4_19

Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation

Yuandong Tian²¹,
C. Lawrence Zitnick²² &
Srinivasa G. Narasimhan²¹

Conference paper

9644 Accesses
63 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7576))

Abstract

Human pose estimation requires a versatile yet well-constrained spatial model for grouping locally ambiguous parts together to produce a globally consistent hypothesis. Previous works either use local deformable models deviating from a certain template, or use a global mixture representation in the pose space. In this paper, we propose a new hierarchical spatial model that can capture an exponential number of poses with a compact mixture representation on each part. Using latent nodes, it can represent high-order spatial relationship among parts with exact inference. Different from recent hierarchical models that associate each latent node to a mixture of appearance templates (like HoG), we use the hierarchical structure as a pure spatial prior avoiding the large and often confounding appearance space. We verify the effectiveness of this model in three ways. First, samples representing human-like poses can be drawn from our model, showing its ability to capture high-order dependencies of parts. Second, our model achieves accurate reconstruction of unseen poses compared to a nearest neighbor pose representation. Finally, our model achieves state-of-art performance on three challenging datasets, and substantially outperforms recent hierarchical models.

Download to read the full chapter text

Chapter PDF

References

Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: Computer Vision and Pattern Recognition, pp. 1014–1021. IEEE (2009)
Google Scholar
Bergtholdt, M., Kappes, J., Schmidt, S., Schnörr, C.: A study of parts-based object class detection using complete graphs. International Journal of Computer Vision 87(1), 93–117 (2010)
Article MathSciNet Google Scholar
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: International Conference on Computer Vision, ICCV (2009)
Google Scholar
Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: BMVC (2009)
Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient matching of pictorial structures. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 66–73. IEEE (2000)
Google Scholar
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Transactions on Computers 100(1), 67–92 (1973)
Article Google Scholar
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: BMVC (2010)
Google Scholar
Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1465–1472. IEEE (2011)
Google Scholar
Lan, X., Huttenlocher, D.P.: Beyond trees: Common-factor models for 2d human pose recovery. In: Tenth IEEE International Conference on Computer Vision, ICCV 2005, vol. 1, pp. 470–477. IEEE (2005)
Google Scholar
Marr, D., Nishihara, H.K.: Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society of London. Series B. Biological Sciences 200(1140), 269–294 (1978)
Article Google Scholar
Ramanan, D.: Learning to parse images of articulated bodies. In: NIPS, vol. 19, p. 1129 (2007)
Google Scholar
Sun, M., Savarese, S.: Articulated part-based model for joint object detection and pose estimation. In: ICCV (2011)
Google Scholar
Tian, T.P., Sclaroff, S.: Fast globally optimal 2d human detection with loopy graph models. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 81–88. IEEE (2010)
Google Scholar
Tran, D., Forsyth, D.: Improved Human Parsing with a Full Relational Model. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 227–240. Springer, Heidelberg (2010)
Chapter Google Scholar
Wang, Y., Tran, D., Liao, Z.: Learning hierarchical poselets for human parsing. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1705–1712. IEEE (2011)
Google Scholar
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: Computer Vision and Pattern Recognition (CVPR), pp. 1385–1392. IEEE (2011)
Google Scholar
Zhu, L.L., Chen, Y., Yuille, A., Freeman, W.: Latent hierarchical structural learning for object detection. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1062–1069. IEEE (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA
Yuandong Tian & Srinivasa G. Narasimhan
Microsoft Research, One Microsoft Way, Redmond, WA, 98052, USA
C. Lawrence Zitnick

Authors

Yuandong Tian
View author publications
You can also search for this author in PubMed Google Scholar
C. Lawrence Zitnick
View author publications
You can also search for this author in PubMed Google Scholar
Srinivasa G. Narasimhan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, Y., Zitnick, C.L., Narasimhan, S.G. (2012). Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7576. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33715-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-33715-4_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33714-7
Online ISBN: 978-3-642-33715-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics