Skip to main content

Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition

  • Chapter
  • First Online:

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

Abstract

We consider the problem of parsing human poses and recognizing their actions in static images with part-based models. Most previous work in part-based models only considers rigid parts (e.g., torso, head, half limbs) guided by human anatomy. We argue that this representation of parts is not necessarily appropriate. In this paper, we introduce hierarchical poselets—a new representation for modeling the pose configuration of human bodies. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g., torso + left arm). In the extreme case, they can be the whole bodies. The hierarchical poselets are organized in a hierarchical way via a structured model. Human parsing can be achieved by inferring the optimal labeling of this hierarchical model. The pose information captured by this hierarchical model can also be used as a intermediate representation for other high-level tasks. We demonstrate it in action recognition from static images.

Editors: Isabelle Guyon and Vassilis Athitsos.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Both data sets can be downloaded from http://vision.cs.uiuc.edu/humanparse.

  2. 2.

    A small number of images/annotations we obtained from the authors of Yang et al. (2010) are somehow corrupted due to some file-system failure. We have removed those images from the data set.

References

  • M. Andriluka, S. Roth, B. Schiele, Pictorial structures revisited: people detection and articulated pose estimation, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009

    Google Scholar 

  • L. Bourdev, J. Malik, Poselets: body part detectors training using 3d human pose annotations, in IEEE International Conference on Computer Vision, 2009

    Google Scholar 

  • L. Bourdev, S. Maji, T. Brox, J. Malik, Detecting people using mutually consistent poselet activations, in European Conference on Computer Vision, 2010

    Google Scholar 

  • C.K. Chow, C.N. Liu, Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3), 462–467 (1968)

    Article  MATH  Google Scholar 

  • N. Dalal, B. Triggs, Histogram of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005

    Google Scholar 

  • V. Delaitre, I. Laptev, J. Sivic, Recognizing human actions in still images: a study of bag-of-features and part-based representations, in British Machine Vision Conference, 2010

    Google Scholar 

  • C. Desai, D. Ramanan, C. Fowlkes, Discriminative models for static human-object interactions, in Workshop on Structured Models in Computer Vision, 2010

    Google Scholar 

  • P. Dollár, V. Rabaud, G. Cottrell, S. Belongie, Behavior recognition via sparse spatio-temporal features, in ICCV’05 Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005

    Google Scholar 

  • A.A. Efros, A.C. Berg, G. Mori, J. Malik, Recognizing action at a distance, in IEEE International Conference on Computer Vision, 2003, pp. 726–733

    Google Scholar 

  • M. Eichner, V. Ferrari, Better appearance models for pictorial structures, in British Machine Vision Conference, 2009

    Google Scholar 

  • P.F. Felzenszwalb, D.P. Huttenlocher, Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)

    Article  Google Scholar 

  • P.F. Felzenszwalb, R.B. Girshick, D. McAllester, D. Ramanan, Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  • V. Ferrari, M. Marín-Jiménez, A. Zisserman, Progressive search space reduction for human pose estimation, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008

    Google Scholar 

  • V. Ferrari, M. Marín-Jiménez, A. Zisserman, Pose search: retrieving people using their pose, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009

    Google Scholar 

  • D.A. Forsyth, O. Arikan, L. Ikemoto, J. O’Brien, D. Ramanan, Computational studies of human motion: part 1, tracking and motion synthesis. Found. Trends Comput. Gr. Vis. 1(2/3), 77–254 (2006)

    Article  Google Scholar 

  • A. Gupta, A. Kembhavi, L.S. Davis, Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)

    Article  Google Scholar 

  • N. Ikizler, R. Gokberk Cinbis, S. Pehlivan, P. Duygulu, Recognizing actions from still images, in International Conference on Pattern Recognition, 2008

    Google Scholar 

  • N. Ikizler-Cinbis, R. Gokberk Cinbis, S. Sclaroff, Learning actions from the web, in IEEE International Conference on Computer Vision, 2009

    Google Scholar 

  • H. Jiang, D.R. Martin, Globel pose estimation using non-tree models, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008

    Google Scholar 

  • T. Joachims, T. Finley, C.-N. Yu, Cutting-plane training of structural SVMs, in Machine Learning, 2008

    Google Scholar 

  • S. Johnson, M. Everingham, Combining discriminative appearance and segmentation cues for articulated human pose estimation, in International Workshop on Machine Learning for Vision-based Motion Analysis, 2009

    Google Scholar 

  • S. Johnson, M. Everingham, Clustered pose and nonlinear appearance models for human pose estimation, in British Machine Vision Conference, 2010

    Google Scholar 

  • S.X. Ju, M.J. Black, Y. Yaccob, Cardboard people: a parameterized model of articulated image motion, in International Conference on Automatic Face and Gesture Recognition, 1996, pp. 38–44

    Google Scholar 

  • Y. Ke, R. Sukthankar, M. Hebert, Event detection in crowded videos, in IEEE International Conference on Computer Vision, 2007

    Google Scholar 

  • M.P. Kumar, A. Zisserman, P.H.S. Torr, Efficient discriminative learning of parts-based models, in IEEE International Conference on Computer Vision, 2009

    Google Scholar 

  • T. Lan, Y. Wang, W. Yang, G. Mori, Beyond actions: discriminative models for contextual group activities, in Advances in Neural Information Processing Systems (MIT Press, 2010)

    Google Scholar 

  • X. Lan, D.P. Huttenlocher, Beyond trees: common-factor models for 2d human pose recovery. IEEE Int. Conf. Comput. Vis. 1, 470–477 (2005)

    Google Scholar 

  • I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008

    Google Scholar 

  • S. Maji, L. Bourdev, J. Malik, Action recognition from a distributed representation of pose and appearance, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011

    Google Scholar 

  • D. Marr, A Computational Investigation into the Human Representation and Processing of Visual Information (W. H. Freeman, San Francisco, 1982)

    Google Scholar 

  • G. Mori, Guiding model search using segmentation. IEEE Int. Conf. Comput. Vis. 2, 1417–1423 (2005)

    Google Scholar 

  • G. Mori, J. Malik, Estimating human body configurations using shape context matching. Eur. Conf. Comput. Vis. 3, 666–680 (2002)

    MATH  Google Scholar 

  • G. Mori, X. Ren, A. Efros, J. Malik, Recovering human body configuration: combining segmentation and recognition. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2, 326–333 (2004)

    Google Scholar 

  • J.C. Niebles, L. Fei-Fei, A hierarchical model of shape and appearance for human action classification, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007

    Google Scholar 

  • J.C. Niebles, H. Wang, L. Fei-Fei, Unsupervised learning of human action categories using spatial-temporal words, in British Machine Vision Conference, vol. 3, 2006, pp. 1249–1258

    Google Scholar 

  • D. Ramanan, Learning to parse images of articulated bodies. Adv. Neural Inf. Process. Syst. 19, 1129–1136 (2006)

    Google Scholar 

  • D. Ramanan, C. Sminchisescu, Training deformable models for localization. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1, 206–213 (2006)

    Google Scholar 

  • D. Ramanan, D.A. Forsyth, A. Zisserman, Strike a pose: tracking people by finding stylized poses. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1, 271–278 (2005)

    Google Scholar 

  • X. Ren, A. Berg, J. Malik, Recovering human body configurations using pairwise constraints between parts. IEEE Int. Conf. Comput. Vis. 1, 824–831 (2005)

    Google Scholar 

  • B. Sapp, C. Jordan, B. Taskar, Adaptive pose priors for pictorial structures, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010a

    Google Scholar 

  • B. Sapp, A. Toshev, B. Taskar, Cascaded models for articulated pose estimation, in European Conference on Computer Vision, 2010b

    Google Scholar 

  • G. Shakhnarovich, P. Viola, T. Darrell, Fast pose estimation with parameter sensitive hashing. IEEE Int. Conf. Comput. Vis. 2, 750–757 (2003)

    Google Scholar 

  • L. Sigal, M.J. Black, Measure locally, reason globally: occlusion-sensitive articulated pose estimation. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2, 2041–2048 (2006)

    Google Scholar 

  • V.K. Singh, R. Nevatia, C. Huang, Efficient inference with multiple heterogenous part detectors for human pose estimation, in European Conference on Computer Vision, 2010

    Google Scholar 

  • P. Srinivasan, J. Shi, Bottom-up recognition and parsing of the human body, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007

    Google Scholar 

  • J. Sullivan, S. Carlsson, Recognizing and tracking human action, in European Conference on Computer Vision LNCS 2352, vol. 1, 2002, pp. 629–644

    Google Scholar 

  • M. Sun, S. Savarese, Articulated part-base model for joint object detection and pose estimation, in IEEE International Conference on Computer Vision, 2011

    Google Scholar 

  • T.-P. Tian, S. Sclaroff, Fast globally optimal 2d human detection with loopy graph models, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010

    Google Scholar 

  • K. Toyama, A. Blake, Probabilistic exemplar-based tracking in a metric space. IEEE Int. Conf. Comput. Vis. 2, 50–57 (2001)

    Google Scholar 

  • D. Tran, D. Forsyth, Improved human parsing with a full relational model, in European Conference on Computer Vision, 2010

    Google Scholar 

  • I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun, Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)

    MathSciNet  MATH  Google Scholar 

  • Y. Wang, G. Mori, Multiple tree models for occlusion and spatial constraints in human pose estimation, in European Conference on Computer Vision, 2008

    Google Scholar 

  • Y. Wang, H. Jiang, M.S. Drew, Z.-N. Li, G. Mori, Unsupervised discovery of action classes, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006

    Google Scholar 

  • Y. Wang, D. Tran, Z. Liao, Learning hierarchical poselets for human parsing, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011

    Google Scholar 

  • W. Yang, Y. Wang, G. Mori, Recognizing human actions from still images with latent poses, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010

    Google Scholar 

  • Y. Yang, D. Ramanan, Articulated pose estimation with flexible mixtures-of-parts, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011

    Google Scholar 

  • B. Yao, L. Fei-Fei, Modeling mutual context of object and human pose in human–object interaction activities, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010

    Google Scholar 

  • L. Zhu, Y. Chen, Y. Lu, C. Lin, A. Yuille, Max margin AND/OR graph learning for parsing the human body, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by NSF under IIS-0803603 and IIS-1029035, and by ONR under N00014-01-1-0890 and N00014-10-1-0934 as part of the MURI program. Yang Wang was also supported in part by an NSERC postdoc fellowship when the work was done. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of NSF, ONR, or NSERC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Wang, Y., Tran, D., Liao, Z., Forsyth, D. (2017). Discriminative Hierarchical Part-Based Models for Human Parsing and Action Recognition. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57021-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57020-4

  • Online ISBN: 978-3-319-57021-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics