Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition

Wang, Yang; Tran, Duan; Liao, Zicheng; Forsyth, David

doi:10.1007/978-3-319-57021-1_9

Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition

Yang Wang⁷,
Duan Tran⁸,
Zicheng Liao⁸ &
…
David Forsyth⁸

Chapter
First Online: 20 July 2017

2230 Accesses
5 Citations

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

Abstract

We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images.

Editors: Isabelle Guyon and Vassilis Athitsos.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Both data sets can be downloaded from http://vision.cs.uiuc.edu/humanparse.
2.
A small number of images/annotations we obtained from the authors of Yang et al. (2010) are somehow corrupted due to some file-system failure. We have removed those images from the data set.

References

M. Andriluka, S. Roth, B. Schiele, Pictorial structures revisited: people detection and articulated pose estimation, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009
Google Scholar
L. Bourdev, J. Malik, Poselets: body part detectors training using 3d human pose annotations, in IEEE International Conference on Computer Vision, 2009
Google Scholar
L. Bourdev, S. Maji, T. Brox, J. Malik, Detecting people using mutually consistent poselet activations, in European Conference on Computer Vision, 2010
Google Scholar
C.K. Chow, C.N. Liu, Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3), 462–467 (1968)
Article MATH Google Scholar
N. Dalal, B. Triggs, Histogram of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005
Google Scholar
V. Delaitre, I. Laptev, J. Sivic, Recognizing human actions in still images: a study of bag-of-features and part-based representations, in British Machine Vision Conference, 2010
Google Scholar
C. Desai, D. Ramanan, C. Fowlkes, Discriminative models for static human-object interactions, in Workshop on Structured Models in Computer Vision, 2010
Google Scholar
P. Dollár, V. Rabaud, G. Cottrell, S. Belongie, Behavior recognition via sparse spatio-temporal features, in ICCV’05 Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005
Google Scholar
A.A. Efros, A.C. Berg, G. Mori, J. Malik, Recognizing action at a distance, in IEEE International Conference on Computer Vision, 2003, pp. 726–733
Google Scholar
M. Eichner, V. Ferrari, Better appearance models for pictorial structures, in British Machine Vision Conference, 2009
Google Scholar
P.F. Felzenszwalb, D.P. Huttenlocher, Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)
Article Google Scholar
P.F. Felzenszwalb, R.B. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
V. Ferrari, M. Marín-Jiménez, A. Zisserman, Progressive search space reduction for human pose estimation, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008
Google Scholar
V. Ferrari, M. Marín-Jiménez, A. Zisserman, Pose search: retrieving people using their pose, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009
Google Scholar
D.A. Forsyth, O. Arikan, L. Ikemoto, J. O’Brien, D. Ramanan, Computational studies of human motion: part 1, tracking and motion synthesis. Found. Trends Comput. Gr. Vis. 1(2/3), 77–254 (2006)
Article Google Scholar
A. Gupta, A. Kembhavi, L.S. Davis, Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)
Article Google Scholar
N. Ikizler, R. Gokberk Cinbis, S. Pehlivan, P. Duygulu, Recognizing actions from still images, in International Conference on Pattern Recognition, 2008
Google Scholar
N. Ikizler-Cinbis, R. Gokberk Cinbis, S. Sclaroff, Learning actions from the web, in IEEE International Conference on Computer Vision, 2009
Google Scholar
H. Jiang, D.R. Martin, Globel pose estimation using non-tree models, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008
Google Scholar
T. Joachims, T. Finley, C.-N. Yu, Cutting-plane training of structural SVMs, in Machine Learning, 2008
Google Scholar
S. Johnson, M. Everingham, Combining discriminative appearance and segmentation cues for articulated human pose estimation, in International Workshop on Machine Learning for Vision-based Motion Analysis, 2009
Google Scholar
S. Johnson, M. Everingham, Clustered pose and nonlinear appearance models for human pose estimation, in British Machine Vision Conference, 2010
Google Scholar
S.X. Ju, M.J. Black, Y. Yaccob, Cardboard people: a parameterized model of articulated image motion, in International Conference on Automatic Face and Gesture Recognition, 1996, pp. 38–44
Google Scholar
Y. Ke, R. Sukthankar, M. Hebert, Event detection in crowded videos, in IEEE International Conference on Computer Vision, 2007
Google Scholar
M.P. Kumar, A. Zisserman, P.H.S. Torr, Efficient discriminative learning of parts-based models, in IEEE International Conference on Computer Vision, 2009
Google Scholar
T. Lan, Y. Wang, W. Yang, G. Mori, Beyond actions: discriminative models for contextual group activities, in Advances in Neural Information Processing Systems (MIT Press, 2010)
Google Scholar
X. Lan, D.P. Huttenlocher, Beyond trees: common-factor models for 2d human pose recovery. IEEE Int. Conf. Comput. Vis. 1, 470–477 (2005)
Google Scholar
I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008
Google Scholar
S. Maji, L. Bourdev, J. Malik, Action recognition from a distributed representation of pose and appearance, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011
Google Scholar
D. Marr, A Computational Investigation into the Human Representation and Processing of Visual Information (W. H. Freeman, San Francisco, 1982)
Google Scholar
G. Mori, Guiding model search using segmentation. IEEE Int. Conf. Comput. Vis. 2, 1417–1423 (2005)
Google Scholar
G. Mori, J. Malik, Estimating human body configurations using shape context matching. Eur. Conf. Comput. Vis. 3, 666–680 (2002)
MATH Google Scholar
G. Mori, X. Ren, A. Efros, J. Malik, Recovering human body configuration: combining segmentation and recognition. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2, 326–333 (2004)
Google Scholar
J.C. Niebles, L. Fei-Fei, A hierarchical model of shape and appearance for human action classification, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007
Google Scholar
J.C. Niebles, H. Wang, L. Fei-Fei, Unsupervised learning of human action categories using spatial-temporal words, in British Machine Vision Conference, vol. 3, 2006, pp. 1249–1258
Google Scholar
D. Ramanan, Learning to parse images of articulated bodies. Adv. Neural Inf. Process. Syst. 19, 1129–1136 (2006)
Google Scholar
D. Ramanan, C. Sminchisescu, Training deformable models for localization. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1, 206–213 (2006)
Google Scholar
D. Ramanan, D.A. Forsyth, A. Zisserman, Strike a pose: tracking people by finding stylized poses. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1, 271–278 (2005)
Google Scholar
X. Ren, A. Berg, J. Malik, Recovering human body configurations using pairwise constraints between parts. IEEE Int. Conf. Comput. Vis. 1, 824–831 (2005)
Google Scholar
B. Sapp, C. Jordan, B. Taskar, Adaptive pose priors for pictorial structures, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010a
Google Scholar
B. Sapp, A. Toshev, B. Taskar, Cascaded models for articulated pose estimation, in European Conference on Computer Vision, 2010b
Google Scholar
G. Shakhnarovich, P. Viola, T. Darrell, Fast pose estimation with parameter sensitive hashing. IEEE Int. Conf. Comput. Vis. 2, 750–757 (2003)
Google Scholar
L. Sigal, M.J. Black, Measure locally, reason globally: occlusion-sensitive articulated pose estimation. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2, 2041–2048 (2006)
Google Scholar
V.K. Singh, R. Nevatia, C. Huang, Efficient inference with multiple heterogenous part detectors for human pose estimation, in European Conference on Computer Vision, 2010
Google Scholar
P. Srinivasan, J. Shi, Bottom-up recognition and parsing of the human body, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007
Google Scholar
J. Sullivan, S. Carlsson, Recognizing and tracking human action, in European Conference on Computer Vision LNCS 2352, vol. 1, 2002, pp. 629–644
Google Scholar
M. Sun, S. Savarese, Articulated part-base model for joint object detection and pose estimation, in IEEE International Conference on Computer Vision, 2011
Google Scholar
T.-P. Tian, S. Sclaroff, Fast globally optimal 2d human detection with loopy graph models, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010
Google Scholar
K. Toyama, A. Blake, Probabilistic exemplar-based tracking in a metric space. IEEE Int. Conf. Comput. Vis. 2, 50–57 (2001)
Google Scholar
D. Tran, D. Forsyth, Improved human parsing with a full relational model, in European Conference on Computer Vision, 2010
Google Scholar
I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun, Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)
MathSciNet MATH Google Scholar
Y. Wang, G. Mori, Multiple tree models for occlusion and spatial constraints in human pose estimation, in European Conference on Computer Vision, 2008
Google Scholar
Y. Wang, H. Jiang, M.S. Drew, Z.-N. Li, G. Mori, Unsupervised discovery of action classes, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006
Google Scholar
Y. Wang, D. Tran, Z. Liao, Learning hierarchical poselets for human parsing, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011
Google Scholar
W. Yang, Y. Wang, G. Mori, Recognizing human actions from still images with latent poses, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010
Google Scholar
Y. Yang, D. Ramanan, Articulated pose estimation with flexible mixtures-of-parts, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011
Google Scholar
B. Yao, L. Fei-Fei, Modeling mutual context of object and human pose in human–object interaction activities, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010
Google Scholar
L. Zhu, Y. Chen, Y. Lu, C. Lin, A. Yuille, Max margin AND/OR graph learning for parsing the human body, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008
Google Scholar

Download references

Acknowledgements

This work was supported in part by NSF under IIS-0803603 and IIS-1029035, and by ONR under N00014-01-1-0890 and N00014-10-1-0934 as part of the MURI program. Yang Wang was also supported in part by an NSERC postdoc fellowship when the work was done. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of NSF, ONR, or NSERC.

Author information

Authors and Affiliations

Department of Computer Science, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada
Yang Wang
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Duan Tran, Zicheng Liao & David Forsyth

Authors

Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Duan Tran
View author publications
You can also search for this author in PubMed Google Scholar
Zicheng Liao
View author publications
You can also search for this author in PubMed Google Scholar
David Forsyth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Wang .

Editor information

Editors and Affiliations

University of Barcelona, Barcelona, Spain
Sergio Escalera
ChaLearn, Berkeley, California, USA
Isabelle Guyon
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
Vassilis Athitsos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, Y., Tran, D., Liao, Z., Forsyth, D. (2017). Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-57021-1_9
Published: 20 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57020-4
Online ISBN: 978-3-319-57021-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics